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Useful books 

The books by Shreve [12, 13] are excellent on (respectively) probabilisitic aspects of the 
binomial model and on stochastic calculus for finance. Etheridge [4] is a good stochastic 
calculus primer for finance, Bjork [1] covers many finance topics outside the scope of the 
course, Wilmott et al [15] is good on the PDE aspects of the subject, and background on 
financial derivatives is given in Hull [7]. Jacod and Protter [8] and Grimmett and Stirzaker 
[5] are useful for background probability material. 

The lecture notes and background material 

These notes contain the core material. Some material is marked with an asterix and is not 
examinable. Some probability theory underlying conditional expectation and martingales 
is contained in the supplementary notes Background Probability, available on the course 
website. These are for those who wish to brush up on some probabilistic material, and are (I 
hope) helpful in developing intuition for notions like nitrations and adaptedness of stochastic 
proicesses, using the binomial stock price model as an example. 

Below are some comments on the level of mathematical knowledge you will need to 
acquire. Its application to finance, as exemplified by the binomila modle and the Black- 
Scholes model, is fundamental to the course. 

The Background Probability material is not examinable. You are expected to become 
familiar with the use of some probabilistic termninology (cr-algebras, nitrations, random 
variables that are measurable with respect to a a-algebra). You are expected to have some 
familiarity with the properties of conditional expectation (you will not be examined on proofs 
of these) and martingales, and to be able to use them. 

You are expected to know the defining properties of a stochastic process B = {Bt)t>o 
known as Brownian motion (BM), and to understand how these lead to the fact that its 
quadratic variation (QV) process [B] is equal to the time elapsed: [B] t = t. You are expected 
to know (but not prove) Levy's criterion: any continuos martingale M satisfying [M] t = t is 
a BM. 

You should be able to use the properties of Brownian motion (such as its independent 
Gaussian increments property and its quadratic variation property) and you should have some 
appreciation of how these lead to the properties of the Ito integral (such as the martingale 
property and the Ito isometry) for elementary integrands (though you are not required to 
know the full theory of the construction of the Ito integral for general integrands) . 

You are expected to have an appreciation of how the quadratic variation property leads 
to the Ito formula and to properties of the Ito integral (that is, stochastic calculus). You are 
expected to be able to use the ltd formula (both the one-dimensional and multi-dimensional 
versions) fluently. You are expected to understand (and prove using the Ito formula) the 
connection between PDEs and stochastic calculus, in the form of the Feynman-Kac theorem. 
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Part I 

Introduction to derivative securities 

1 Financial derivatives 

Definition 1.1. A European derivative security (or European contingent claim) is a financial 
contract which pays its holder a random amount (the payoff' of the claim) at some future 
time T (the maturity time of the derivative) . 

An American derivative delivers the payoff at a random time r < T chosen by the holder 
of the contract. The payoff is (typically) contingent on the value of some other underlying 
security (or securities), or on the level of some non-traded reference index. 

Example 1.2 (Forward contract). The holder of a forward contract agrees to buy an asset at 
some future time T for a fixed price K (the delivery price) that is decided at initiation of 
the contract. Hence, the forward contract has a value (to the holder) at maturity of St — K, 
where St is the underlying asset value at maturity. 

The forward contract thus allows the holder to fix the purchase price of the underlying 
asset in advance, and so can be used to mitigate the risk inherent in the price uncertainty 
(that is, to hedge the price risk). It can also be used to speculate against future price moves. 

Example 1.3 (European call option). A European call option has payoff (St — K) + at ma- 
turity. It confers to the owner the right (but not the obligation) to buy the underlying asset 
at maturity time T for a fixed price K (the strike price, or exercise price). 

The origins of derivatives lie in medieval agreements between farmers and merchants to 
trade the farmer's harvest at some future date, at a price set in advance. This allowed 
farmers to fix the selling price of their crop, and reduced the risk of having to sell at a lower 
price than their cost of production, which might happen in a bumper harvest year. This is 
one motivation for the existence of derivatives: they give random payoffs which can be used 
to eliminate uncertainty from future asset price trades. The act of removing uncertainty in 
finance is called hedging. 

Consider a farmer whose unit cost of crop production is C. His profit on selling the crop 
would be St — C, where St is the market crop price at harvest time. If the farmer were to 
sell (that is, take a short position in) a forward contract with delivery price K > C, at some 
time t < T, then his overall payoff at T would be S T - C - (S T - K) = K - C > 0. The risk 
of the crop price being less than the cost of production has been removed. 

Since derivatives have random payoffs, they can also be used to take risk by speculating 
on the future values of asset prices, and they are often a cheaper device for doing so than 
investing in the underlying asset. For instance, a European call option on a stock, with payoff 
(St — K) + , where St is the (random) stock price at time T and K > is a constant called 
the strike price of the option, allows the holder of the call to take a positve payoff if the stock 
price is above K, and the cost of acquiring a call option is usually only a fraction of the cost 
of buying the stock itself. 

This course will be about how to assign a value to a derivative at any time t <T. This 
will involve modelling the randomness in the underlying asset price process S = (St)o<t<T- 
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To do this, we will need the notion of a stochastic process on a filtered probability space. We 
shall see that the key to valuing derivatives is to attempt to use the underlying asset to 
remove the risk from selling (or buying) the derivatieve. That is, derivative valuation is via 
a hedging argument. 

1.1 Underlying assets 

Typical assets which are traded in financial markets, and which can be the underlying assets 
for a derivative contract, include: 

• shares (stocks) 

• commodities (metals, oil, other physical products) 

• currencies 

• bonds (assets used as borrowing tools by govenments and companies) which pay fixed 
amounts at regular intervals to the bond holder. 

An agent who holds an asset will be said to hold a long position in the asset, or to be 
long in the asset. 

An agent who has sold an asset will be said to hold a short position in the asset, or to be 
short in the asset. 

For the most part in this course, we will focus on derivative securities which have a stock 
as underlying asset. The stock price will be a stochastic process denoted by S = (St)o<t<T 
on a filtered probability space (O, J^*, F := (Tt)o<t<T, F). This means that for each t G [0, T], 
St, the value of the stock at time t, is a random variable that is measurable with respect to 
the u-algebra Tt- That is, S is a process that is adapted to the filtration F. We shall see 
later that this means the following: each Tt is a collection of subsets of a set f2 (the sample 
space), closed under complements and under countable unions, and with T s C Tt for s < t 
(such an increasing sequence of a-algebras is called a filtration, and will represent increasing 
information as time evolves) . Each St is a function from f2 to R + with the property that sets 
of the form 

{u G (l\S t (u) eici 1 } 

lie in Tt- This is what we mean by saying that St is an J^-measurable random variable. The 
adaptedness property (St is measurable for each t G [0, T]) is tantamount to the idea that 
the information available at time t is sufficient to know the value of St (that is, if you observe 
the stock market up to time t, you will know the current value of the stock price). 

1.2 Interest rates and time value of money 

Let us measure time in some convenient units, say years. If an interest rate r is quoted per 
annum and with compounding frequency at time intervals 5t, this means that an amount A 
invested for a time period St will grow to ^4(1 + r St). If this is re-invested for another period 
St, the balance becomes A(l + r St) 2 , and so on. So after n periods, with t := nSt, we have 
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^4(1 + r 5t) n = A{\ + rt/n) n . A continuously compounded interest rate corresponds to the 
limit n — > oo, or 5t — > 0. In this case, after time t an amount A will grow to 



rt\ n 

lim A I H ) = Ae 

n^oo \ n 



rt 



So an amount A invested at time zero for a time t will grow to an amount Ae rt , where r 
is the continuously compounded risk-free interest rate. We call the amount Ae rt the future 
value of A invested at time zero, and the factor e rt is called an accumulation factor. 

By the same token, receiving an amount A at time t is equivalent to receiving Ae~ rt at 
time zero. We call Ae~ rt the present value of A received at time t (we say that A is discounted 
to the present) and the factor e~ rt is called a discount factor. 

It is usually convenient (but nothing more) to assume that interest is continuously com- 
pounded. We do not need to assert that, in reality, interest is continuously compounded, in 
order to use a continuously compounded interest rate in all our analysis. If the interest is 
actually compounded m times a year at an interest rate of R per annum, then we can still 
use a continously compounded interest rate r simply by making the identification 

(R\ mt 
1 + — J =Aexp(rt), (1.1) 

so that there is a one-to-one correspondence between the interest rate R (compounded m 
times per annum) and the continuously compounded interest rate r. In this course, we will 
nearly always use continuously compounded rates when considering continuous time models. 

A differential version of the above arguments is as follows. In continuous time, we model 
the time evolution of cash in a bank account in terms of a riskless asset which we shall call 
a money market account, which is the value at time t > of $1 invested at time zero and 
continuously earning interest which is reinvested. We shall denote the value of this asset at 
time t by B t , which satisfies 

dB t = rB t dt, Bq = 1, (1.2) 

where r is the (assumed constant) interest rate. Then the value of the bank account at time 
t is given by B t = e rt , which we see is the accumulation factor we encountered above. 

A more complex model could assume that interest rates are time- varying (possibly stochas- 
tic). In this case the money market account would satisfy 

dB t = r t B t dt, B = l, (1.3) 

where r t is the instantaneous (or short term) interest rate. We have allowed for this to be 
time-varying, and r t represents the interest rate in the time interval [t,t + dt). From (1.3) 
we see that 

Bt = exp QT r u du^j , (1.4) 

and this is the accumulation factor in this case. This is the factor by which $1 invested at 
time zero grows to at time t, when the interest generated is continually reinvested. 
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1.3 Forwards and futures 

Definition 1.4 (Forward contract). A forward contract obliges its holder to buy an underly- 
ing asset (a stock, say) at some future time T (the maturity time) for a price K (the delivery 
price that is fixed at contract initiation. Hence, at time T, when the stock price is St, the 
contract is worth St — K (the payoff of the forward) to the holder. This payoff is shown in 
Figure 1. 



Figure 1: Forward contract payoff as function of final underlying asset price 

A futures contract is a rather specialised forward contract, traded on an organised ex- 
change, and such that, if a contract is traded at some time t < T, the delivery price is set 
to a special value i*i Tj called the futures price of the asset or the forward price of the asset, 
chosen so that the value of the futures contract at initiation (that is, at time t), is zero. 

Futures markets have other specialised features that we will encounter later. Principally, 
a participant in a futres market is required to set up a so-called margin account as collateral, 
so that one's daily profits and losses are reflected by adjustments in the margin account. 
One also has to maintain the balance in the margin account at some minimum value (the 
maintenance margin), and recives a so-called margin call (a demand to top-up the margin 
account) if the balance in the margin account falls below the maintenance margin. This 
mechanism is designed to remove the risk of defualt from the market, and hence futures 
markets are very liquid. One can consult Hull [7] for detailed descriptions of the workings of 
futures excahnges. 

1.3.1 Valuation of forward contracts 

In what follows we value forward contracts on a non-dividend paying stock, that is, an asset 
with price process S = (St)o<t<r that pays no income to its holder. 

Lemma 1.5. The value at time t < T of a forward contract with delivery price K and 
maturity T, on an asset with price process S = (St)o<t<T, is ft,T = f(t,St) = f(t,St;T) 



FORWARD CONTRACT PAYOFF 




TERMINAL ASSET PRICE S T 



f(t,S t ;K,T), given by 



ft,T = St-Kexp(-r(T-t)), < t < T. 



(1.5) 
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Proof. This is a simple hedging argument which provides our first example of a riskless 
hedging strategy. Start with zero wealth at time t < T, and sell the contract at time t < T 
for some price ft,T- Hedge this sale by purchasing the asset for price St- This requires 
borrowing of St — ft,T- 

At time T, sell the asset for price K under the terms of the forward contract, and require 
that this is enough to pay back the loan. Hence we must have 

K=(S t -ft,T)exp(r(T-t)), 

and the result follows. 

□ 

Corollary 1.6. The forward price of the asset at time t < T is given by 

F t:T = S t exp(r(T - t)), < t < T. 
Proof. Set ft t T = in Lemma 1.5 and then by definition we must have K = F t ^r- 

□ 

1.4 Arbitrage 

The simple argument above for valuing a forward contract is an example of valuation by the 
principle of no arbitrage. If the relationship in Lemma 1.5 is violated, then an elementary 
example of a riskless profit opportunity, called an arbitrage, ensues. Here is a definition of 
arbitrage. 

Definition 1.7 (Arbitrage). Let X = (X t )o<t<T denote the wealth process of a trading 
strategy. An arbitrage over [0, T] is a strategy satisfying Xq = 0, F[Xt > 0] = 1 and 
P[A T > 0] > 0. 

So an arbitrage is guaranteed not to lose money and has a positive probability of making 
a profit. If the valuation formula (1.5) for the forward contract is violated, an immediate 
arbitrage opportunity occurs, as we now illustrate. 

Suppose ft,T > St — K exp(— r (T — t)). Then one can short the forward contract and buy 
the stock, by borrowing St — ft,T at time t. At maturity, one sells the stock for K under the 
terms of the forward contract and uses the proceeds to pay back the loan, yielding a profit 
of 

K-(S t - ft, T ) exp(r(T - t)) > 0. 

This is an arbitrage. A symmetrical argument applies if ft : r < St — K exp(— r(T — t)) (and 
you should supply this). 

The principle of riskless hedging and no arbitrage will also apply, rather less trivially, to 
the valuation of options later in the course. 

An equivalent way of looking at no arbitrage is sometimes called the law of one price. 
Two portfolios which give the same payoff at T should have the same value at time t < T. 
Let us show how this applies to the valuation of a forward contract. Consider the following 
two portfolios at time t < T: 
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• a long position in one forward contract, 

• a long position in the stock plus a short cash position of K exp(— r(T — t)). 

At time T, these are both worth St — K, so their values at time t < T must be equal, 
yielding j t ,T = St — Kexp(— r(T — t)), as before. Notice that the second portfolio perfectly 
replicates (or perfectly hedges) the payoff of the forward contract, meaning that it reproduces 
the payoff St~K. Denote the position in the stock that is needed to perfectly hedge a forward 
contract by H( . Then we have that h( = 1 for all t € [0, T], and note that 

H t f = f x (t,S t ) = l, 0<t<T, 

where f(t,x) := x — Ke~ r< - T ~ t \ This is a simple example of a "delta hedging rule", in 
which one differentiates the pricing function of the derivative with respect to the variable 
representing the underlying asset price, in order to get the hedging strategy. We will see a 
similar result when valuing options. 

1.4.1 Forward on dividend-paying stock 

The stock in the preceding analysis was assumed to pay no dividends. Now assume that the 
stock pays dividends as a continuous income stream with dividend yield q. This means that 
in the interval [t, t + dt), the income received by someone holding the stock will be qSfdt. 
Suppose that at time t £ [0, T] an agent holds n t shares of stock. The income received in the 
next infinitesimal time interval is qn t St dt. If this is immediately re-invested in shares, the 
holding in shares satisfies 

dnt = qnt dt. 

Hence, if the initial holding is tiq at time zero, we have 

n t = no exp(qt), < t < T. 

In particular, we have 

rit = tit exp(— q(T — t), < t < T. 

This means that in order to hold one share of stock at time T, one may buy exp(— q(T — t)) 
shares at t < T and re-invest the dividends in the stock. 

If we use this to value a forward contract on the dividend-paying stock we arrive at the 
following. 

Lemma 1.8. The value at time t < T of a forward contract with delivery price K and 
maturity T, on a stock with price process S = (St)o<t<T paying dividends at a dividend yield 
q, is given by 

f t ,T = S t exp(-q(T-t))-Kexp(-r(T-t)), < t < T. (1.6) 

Proof. This is again a hedging argument. Start with zero wealth at time t <T, and sell the 
contract at time t < T for some price ft : T- Hedge this sale by purchasing exp(— q(T — t)) 
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shares at price St- This requires borrowing of exp(—q(T — t)) St — ft,T ■ Re-invest all dividends 
immediately in the stock, so as to hold one share at time T. 

At time T, sell the asset for price K under the terms of the forward contract, and ensure 
that this is enough to pay back the loan. Hence we must have 

K = (S t exp(-q(T - t)) - f t , T ) exp(r(T - t)), 

and the result follows. 

□ 

Corollary 1.9. The forward price of the dividend-paying asset at time t < T is given by 

F t>T = S t exp((r-q)(T-t)), < t < T. 
Proof. Set ft t T = in Lemma 1.8 and then by definition we must have K = F t ^- 

□ 

Remark 1.10 (Forwards and futures on currencies). A foreign currency is treated as an 
asset which pays a "dividend yield" equal to the foreign interest rate rj. Hence, suppose 
S = (St)o<t<T is the exchange rate (the value in dollars of one unit of foreign currency), then 
a forward contract on the foreign currency has value at time t < T given by 

ft, T = S t exp(-r / (T - t)) - K exp(-r(T - t)), 0<t<T, 

where T is the maturity and K is the delivery price. 

1.5 Options 

An option is a contract that gives the holder the right but not the obligation to buy or sell 
an asset for some price that is defined in advance. 

The two most basic option types are a European call and a European put. 

Definition 1.11 (European call option). A European call option on a stock is a contract 
that gives its holder the right (but not the obligation) to purchase the stock at some future 
time T (the maturity time) for a price K (the strike price or exercise price) that is fixed at 
contract initiation. If S = (St)o<t<T denotes the underlying asset's price process, the payoff 
of a call option is (St — K) + , as shown in Figure 2. 

Definition 1.12 (European put option). A European put option on a stock is a contract 
that entitles the holder to sell the underlying stock for a fixed price K, the strike price, at a 
future time T. If S = (St)o<t<T denotes the underlying asset's price process, the payoff of a 
put option is (K — St) + , as shown in Figure 3. 

The act of choosing to buy or sell the asset under the terms of the option contract is 
called exercising the option. 

Options which can be exercised any time before the maturity date T are called American 
options, whilst European options can only be exercised at T. Hence, an American call 
(respectively, put) option allows the holder to buy (respectively, sell) the underlying stock 
for price K at any time before maturity. 
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CALL OPTION PAYOFF 




Figure 2: Call option payoff 



PUT OPTION PAYOFF 




Figure 3: Put option payoff 

Lemma 1.13 (Put-call parity). The European call and put prices c(t,St) and p(t, St) of 
options with the same strike K on a non-dividend paying traded stock with price St at time 
t G [0, T] are related by 

c(t, S t ) - p(t, S t ) = S t - Ke~ r ^ , < t < T. 

Proof. The payoffs of a call and put satisfy 

c(T, S T ) - P(T, S T ) = (S T - K) + - (K - S T ) + = S T - K, 

which shows the (obvious) fact that a long position in a call combined with a short position 
in a put is equivalent to a long position in a forward contract. Hence, their prices at t < T 
must satisfy 

c(t,St)-p(t,St) = ft,T = S t -Ke^ T ~ t \ 0<t<T, 
where ftp is the value of a forward contract at t < T. 

□ 

Remark 1.14. The same argument applied to a dividend-paying stock yields 

c(t, S t ) - p(t, S t ) = ft.T = Ste- q ( T -V - Ke~ r{T -^ , < t < T, 
where q is the dividend yield. 
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1.5.1 Europen call and put price bounds 

Put-call parity is a model-independent result. From it, we see that c(t,St) > f(t,St). That 
is, a call option is always at least as valuable as a forward contract (an obvious fact). 1 

A call option gives its holder the right to buy the underlying stock, which means that 
its value can never be greater than that of the stock, so c(t, St) < St- 2 From this we deduce 
model-independent bounds on a European call option price on a non-dividend-paying stock: 

S t - KeT ri ?-^ < c(t, S t ) <S t , < t < T. 

These bounds are shown in Figure 4. 



20 r 
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Figure 4: Bounds on European call option value 

In Figure 4 we have plotted the upper and lower bounds of a European call value (the 
dotted graphs) as well as the Black-Scholes value of the call (the solid graph). We will show 
in these lectures how this function arises. If the above call option pricing bounds are violated, 
then arbitrage opportunities arise. 

1 Equivalently, comparing the payoff of a call with that of one share plus a short position of Ke~ r ( T ~ t ^ in 
cash, we obtain S T - K < (S T - K) + , and therefore c(t, S t ) > S t - Ke-^-^. 

2 Equivalently, comparing the payoffs of a share and a call, we have St — > (St — K) + , and therefore 
c(t,St)<S t . 
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For example, if If c(t, St) < St — Ke~ r ( T ~ t ^ one should buy the call and short the stock, 
which gives a cash amount St — c(t, St) to be invested in a bank account. At time T, we have 
two possibilities: 

1. St < K, in which case the call is not exercised. The arbitrageur buys the stock in 
the market to close out the short position, using the proceeds from the bank account, 
which stand at (St — c(t, St))e r( - T ~^ prior to buying the stock. This leaves a profit of 

(S t - c(t, S t ))e r( ?-V - S T > (S t - c(t, S t ))e riT ~^ - K 

= e r(T-t) {St _ Ke -r(T-t) _ ^ ^ > Q 

2. St > K, in which case the call is exercised. The arbitrageur buys the stock for K to 
close out the short position, using the proceeds from the bank account, which stand at 
(St — c(t, St))e r ( T ~ t ^ prior to buying the stock. This leaves a profit of 

{Sk - c(t, St))e r ^ — K = e« T -*XSt - Ke~^ - c(t, S t )) > 0. 

We can derive similar model- independent bounds on a put option price. A put option 
gives its holder the right to receive an amount K for the stock, so the most it can be worth 
at maturity is K (if the final stock price is St = 0). Hence, its current value can never be 
greater than the present value of K, so that 

p(t, S t ) < Ke-^^, 0<t<T. 

Similarly, for the value of a put at expiry we have p(T, St) = (K — St) + > K — St- That is, 
a put option is at least as valuable as a short position in a forward contract. Hence we have 
the lower bound 

p(t, S t ) > Ke~ r{T -^ -S t , < t < T. 

The results in this section are model- independent. To say more about option values we need 
a model for the dynamic evolution of a stock price. One of the simplest continuous-time 
models is the Black-Scholes-Merton (BSM) model, which we shall describe later, and one of 
the simplest discrete-time models is the binomial model, which we shall also see shortly. 

1.5.2 Combinations of options 

Options can be combined to give a variety of payoffs for different hedging purposes, or for 
speculation on movements in the underlying asset price, and they are often used to do so 
because the option premiums are relatively small in some cases, thus proving attractive to 
gamblers. 

A straddle is a call and a put with the same strike and maturity. The payoff of a long 
position in a straddle is 

<*-*>♦ + <*-*)♦-{£-_*; %H d.7) 

This payoff is illustrated in Figure 5. 
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LONG STRADDLE PAYOFF 




Figure 5: Straddle payoff 

1.6 Some history* 

As remarked earlier, the origins of derivatives lie in medieval agreements between farmers 
and merchants to insure farmers against low crop prices. 

In the 1860s the Chicago Board of Trade was founded to trade commodity futures (con- 
tracts that set trading prices of commodities in advance), formalising the act of hedging 
against future price changes of important products. 

Options were first valued by Bachelier in 1900 in his PhD thesis, a translation of which 
can be found in the book by Davis and Etheridge [3]. Bachelier introduced a stochastic 
process now known as Brownian motion (BM) to model stock price movements in continuous 
time. Bachelier did this before a rigorous treatment of BM was available in mathematics. 
His work was decades ahead of its time, both mathematically and economically speaking, 
and was therefore not given the credit it deserved at the time. In the decades that followed, 
mathematicians and physicists (Einstein, Wiener, Levy, Kolmogorov, Feller to name but a 
few) developed a rigorous theory of Brownian motion, and Ito developed a rigorous theory of 
stochastic integration with respect to Brownian motion, leading to the notion of a stochastic 
calculus, which we shall encounter. In the 1960s, economists re-discovered Bachelier's work, 
and this was one of the ingredients that led to the modern theory of option valuation. 

In the early 1970s a combination of forces existed which made markets more risky, deriva- 
tives more prominent, and their valuation and trading possible. The system of fixed exchange 
rates that existed before 1970 collapsed, and the Middle East oil crises caused a big increase 
in the volatility of financial prices. This increased the demand for risk management products 
such as options. At the same time Black and Scholes [2] and Merton [10] (BSM) published 
their seminal work on how to price options, based on managing the risk associated with 
selling such an asset. This breakthrough, for which Scholes and Merton received a Nobel 
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Prize (Black having passed away in 1995) coincided with the opening of the Chicago Board 
Options Exchange (CBOE), giving individuals both a means to value option contracts and 
a marketplace where they could profit from this knowledge of the fair price. 

Following on from this, the financial deregulation of the 1980s, allied to technological 
developments which made it possible to trade securities globally and to run large portfolios 
of complex products, caused a huge increase in risky trading across international borders. 
This opened up yet more risks across currencies, interest rates and equities, and financial 
institutions very skillfully (or opportunistically, perhaps) created markets to trade derivatives 
and to sell these products to customers. This has led to the massive increase in derivative 
trading that we now see, with the volume of derivative contracts traded now dwarfing that 
in the associated underlying assets. 

The papers of Black-Scholes [2] and Merton [10] attracted mathematicians to the subject, 
and led to a mathematically rigorous approach to valuing derivatives, based on probability 
and martingale theory, inspired by Harrison and Pliska [6]. This led directly to modern 
financial mathematics, and has also contributed to the advent of derivatives written on a 
plethora of underlying stochastic reference entities, such as interest rates, weather indices, 
default events, as well as on more traditional traded underlying securities such as stocks, 
currencies and interest rates. 



The binomial model is a simple dynamic model of a stock price process in which a fictitious 
coin is tossed, and a stock price depends on the outcome of the coin tosses. 

Let T := {0, 1, . . . , n} represent a discrete time set. Let f2 = Q n , the set of all outcomes of 
n coin tosses. The finite set Q is called the sample space, with elements u G called sample 
points, representing the possible outcomes of the random experiment in which the coin is 
tossed. Each sample point co G Q is a sequence of length n, written as to = ixi\bJ2 ■ ■ ■ u> n , where 
each uit,t G T is either H (head) ot T (tail), representing the outcome of the t th coin toss. 

Let F be the set of all subsets of Q; T is a a -algebra (or a -field), that is, a collection of 
subsets of $7 with the properties: (i) G T , (ii) if A G T then A c G T , (iii) if A\, A2, . . . is 
a sequence of sets in T , then \J^_^A\. is also in T . We interpret u-algebras as a record of 
information (as we shall see shortly). The pair (S7, J 1 ") is called a measurable space. 

We place a probability measure F on (fi, F). A probability measure P is a function 
mapping P : T — > [0, 1] with the properties: (i) F(Q) = 1, (ii) if A\,A2, ... is a sequence 
of disjoint sets in T, then Fi\J^ =1 A^) = Y^k=\^(Ak)- The interpretation is that, for a set 
A G J- , there is a probability in [0, 1] that the outcome of a random experiment will lie in 
the set A. We think of F(A) as this probability. The set A £ J 7 is called an event. 

For A £ J 7 we define 
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We can define ¥(A) in this way because A has only finitely many elements. 

Suppose the probability of H on each coin toss is p G (0, 1), then the probability of T is 
q := 1 — p. For each u = {ui\ui2 ■ ■ ■ oj n ) G 12 we define 

p( w ) - = ^Number of H in ui ^Number of T in u 

Then for each A £ J 7 we define F(A) according to (2.1). 

Definition 2.1 (Filtration). A filtration F = (Ft)teT is a sequence of increasing u-algebras 
Fo,Fi, • • • , F n . That is, F s C Ft if s < t. 

A probability space (fi, J 7 , P) equipped with a filtration F = (J r t)tgT ! with each Ft C J 7 , 
is called a filtered probability space. 

An M-valued random variable X = X(u) on (17, J 7 ) is a measurable function X : 12 — > K, 
that is, the set X~ 1 (A) = {uj G Q : X(cu) £ A} = {X £ A} £ F, for every Borel set A C M. 

Since a random variable maps 12 into R, we can look at the preimage, under the random 
variable, of sets in R, that is, sets of the form 

X^ 1 (A) = {ui G Q : X(u) G A C R} = {X G A}, 

which is, of course, a subset of 11 The complete list of subsets of that you can get as 
preimages (under X) of sets in R, turns out to be a u-algebra, whose content is exactly the 
information obtained by observing X, and is called the a-algebra generated by the random 
variable X. 

Definition 2.2. Let be a nonempty finite set and let F be the a- algebra of all subsets of 
Q. Let I be a random variable on (17, F). The a-algebra cr(X) generated by X is defined 
to be the collection of all sets of the form {u G 0, : X(cj) G A} where A is a subset of R. Let 
Q be a sub-cr-algebra of F. We say that X is Q-measurable if every set in a(X) is also in Q. 

Definition 2.3 (Induced measure (distribution) of a random variable X). Let X be a random 
variable on (12, F, P). For A C R, we define the induced measure of the set A to be 

fi x (A) := P{we!]: X(w) G A} = ¥{X G A}. 

So the induced measure of a set A tells us the probability that X takes a value in A By 
the distribution of a random variable X, we mean any of the several ways of characterizing fix 
(so we shall sometimes refer to the induced measure \xx associated with a random variable 
X as the distribution of X). 

Remark 2.4. We make a clear distinction between random variables and their distributions. 
A random variable is a mapping from Q to R, nothing more, and has an existence quite apart 
from any discussion of probabilities. The distribution of a random variable is a measure fix 
on K, that is, a way of assigning probabilities to sets in M. It depends on the random variable 
X and on the probability measure P we use on £1. If we change P, we change the distribution 
of the random variable X, but not the random variable itself. Thus, a random variable can 
have more than one distribution (e.g. an objective or "market" distribution, and a "risk- 
neutral" distribution, and we shall see such constructs in finance). In a similar vein, two 
different random variables can have the same distribution. 
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Definition 2.5. Let X be a random variable on (f2, J 7 , P). The expected value or expectation 
of X is denned to be 

ELY] := J^X(w)P(w). (2.3) 

wen 

Notice that this is a sum over the sample space fl When is finite and the random 
variable X takes on a finite number of values, this sum over Q can be converted to a sum 
over M, as shown in the Background Probability notes. It is easy to see that for an indicator 
function l_a of an event A € J 7 , the definition of expectation leads to = P(A). 

Remark 2.6. When the sample space Q is infinite and, in particular, uncountable, the sum- 
mation in the definition of expectation is replaced by an integral. In general, the integral over 
an abstract measurable space (0, J-) with respect to a probability measure P is a so-called 
Lebesgue integral (which has all the linearity and comparison properties we associate with 
ordinary integrals). The expectation ELY] becomes the Lebesgue integral over f2 of X with 
respect to P, written as 

ELY] = f XdF = f X(u)6P(u)= [ xdfi x (x) = [ xfi x (dx). (2.4) 
Jn Jn Jr Jr 

When X takes on a continuum of values and has a density fx, then dfj,x(x) = fx(x)dx 
and the integral on the right-hand side if (2.4) reduces to the familiar Riemann integral 
f R xfx(x)dx. 

We do not delve into the construction of Lebesgue integrals over abstract spaces here. 
Merely think of the RHS of (2.4) as an alternative notation for the sum ^ ueS1 I(w)P(u). 
See Williams [14] or Shreve [13] for more details on Lebesgue integration. 

Definition 2.7 (Discrete-time stochastic process), let T = {0} UN = {0,1,2,...} be a 
discrete time set. A discrete-time stochastic process (Xt)tej 1S a sequence of random variables. 

Definition 2.8. Let T = {0}UN. A stochastic process (X t ) te j on a filtered space (Q, J 7 , (J-'t)tei:) 
is adapted to the filtration (FtjteT if Y t is J^-measurable for each t G T. 

3 Conditional expectation 

Definition 3.1 (Conditional expectation). Let let X be a random variable on a probability 
space (Q, J 7 , P), and let Q be a sub-cr-algebra of T. The conditional expectation K[X\Q] is 
defined to be any random variable Y that satisfies 

1. Y = ELY|£?] is ^-measurable 

2. For every set A 6 Q, we have the partial averaging property 

[ YdF= [ ELY|£]dP= / YdP. 
J A J A J A 
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Note (we do not prove this here) that there is always a random variable Y satisfying 
the above properties (provided that E|X| < oo), i.e. conditional expectations always exist. 
There can be more than one random variable satisfying the above properties, but if Y' is 
another one, then Y = Y' with probability 1 (or almost surely (a.s.)). 

For random variables X, Y it is standard notation to write 

E[X\Y] :=E[X\a(Y)}. 

3.1 Partial averaging 

The partial averaging property is 

/ E[X\G]dF= [ XdF, VA£Q. 
J A J A 

We can rewrite this as 

E[l A -E[X\g]]=E[t A -X}. (3.1) 

Note that t A (w) (which equals 1 for u> G A and otherwise) is a ^-measurable random 
variable. Equation (3.1) suggests (and it is indeed true) that the following holds (see the 
Background Probability notes for a proof). 

Lemma 3.2. IfV is any Q -measurable random variable, then provided E[V ■ E[X\Q]] < oo, 

E[V ■E[X\g\]=E[V ■ X]. (3.2) 

Based on Lemma 3.2, we can replace the second condition in the definition of conditional 
expectation by (3.2), so that the defining properties of Y = E[X\Q] are: 

1. Y = E[X\Q] is ^-measurable 

2. For every ^-measurable random variable V, we have 

E[V -E[X\g]]=E[V ■ X].) (3.3) 

Notice that we can write (3.3) as 

E[V ■ (E[X\Q] — X)] = 0, 

which allows an interpretation of E[X|C/] as the projection of the vector X on to the subspace 
Q. Then E[X|£] — X is perpendicular to any V in the subspace. 

3.2 Properties of conditional expectation 

Proofs of the properties below are given in the Background Probability notes. All the X 
below satisfy E|X| < oo. 

1. E[E[X|£]] = E[X]. 

The conditional expectation of X is thus an unbiased estimator of the random variable 
X. 
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2. If X is ^-measurable, then E[X|0] = X. 

In other words, if the information content of Q is sufficient to determine X, then the 
best estimate of X based on Q is X itself. 

3. (Linearity) For ai,a 2 G K, 

E[aiXi + a 2 X 2 |g] = aiE[X!|g] + a 2 E[X 2 |0]. 

4. (Positivity) If A > almost surely, then E[X|(?] > almost surely. 

5. (Jensen's inequality) If 4> : R — > R is convex and E|<^>(X)| < oo, then 

n<f>(x)\g] > mx\g}). 

6. (Tower property) If % is a sub-cr-algebra of Q, then 

E[E[X\g]\H}=E[X\H], a.s. 

The intuition here is that Q contains more information than T~L. If we estimate X based 
on the information in Q, and then estimate the estimator based on the smaller amount 
of information in ~H, then we get the same result as if we had estimated X directly 
based on the information in %. 

7. (Taking out what is known). If Z is ^-measurable, then 

E[zx\g] = z-E[x\g\. 

8. (Role of independence) If X is independent of % (i.e. if cr(X) and T-L are independent 
cr-algebras), then 

E[X\U}=E[X). (3.4) 

The intuition behind (3.4) is that if X is independent of T~L, then the best estimate of 
X based on the information in H is E[X], the same as the best estimate of X based on 
no information. 

4 Martingales 

Definition 4.1 (Martingale). A stochastic process M = (M t )f =0 on a filtered probability 
space (fi, J 7 , ¥ := (F t )J! =0 ,F) is a martingale with respect to the filtration F = (J r t )t=o if: (i) 
each Mt is .^-measurable (so the process (Mt)f =0 is dapted to the filtration (J 7 t)^ ); (ii) for 
each t G {0,1,..., T}, E[\M t \] < oo, and (iii): 

E[M t+1 \T t ] = M t , t = 0,l,...,T-l. 

So martingales tend to go neither up nor down. A supermartingale tends to go down, i.e. 
the second condition above is replaced by EfM^+il^] < M t . A submartingale tends to go 
up, le. E[M t+1 \T t ] > M t . 

A simple argument using the tower property and induction shows the following. 
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Lemma 4.2. 

E[M t+u \T t ] = M u 

for arbitrary u G {1, . . . , T — i}. 

Proof. Consider E[M t+ 2 1 J 1 *]. By the tower property, 

E[M t+2 |Ji] = EpE[M t+2 |Ji+i]|Ji] = E[M t+ i|Ji] = M t , 
and continuing in this fashion we get 

E[M t+u |^ t ] = M t , for u = 1, 2, . . . , T - t. 

□ 

Lemma 4.3. Zei X be an integrable random variable (E [\X\] < oo) on a filtered probability 
space (n,T,¥:= (T t )J =0 ,F). Define 

M t :=E[X\T t ], t e {0,1,..., T}. 

Then M := {M t )f =0 is a (P,F) -martingale. 

Proof. We have 

E[M t+1 |J" t ] = EpE[X|Ji+i]|Ji] 

= E[X| J^] (by the tower property) 
= M t . 

□ 

Definition 4.4 (Predictable process). A a predictable process (£t)?=i on a filtered probability 
space (fi, J 7 , (J^^qjP) s one such that, for each t G {1, . . . , T}, £t is J^-i-measurable. 

The following two propositions are proven in the Background Probability notes. 

Proposition 4.5. Let (M t )f =0 be a martingale on a filtered probability space (f2, F, F := 
(Tt)J =0 ,f). Let (£t)J=i be a bounded predictable process. Then the process N := (N t )f =0 
defined by 

t 

N :=0, N t :=Y,UM s -M s ^), t = 1, . . . ,T, 

8=1 

is a (P, F) -martingale. 

Remark 4.6. The process N is called a martingale transform or discrete-time stochastic 
integral, and is sometimes denoted Nt = J ' £ s dM s or Nt = (£ • M)j. 

Proposition 4.7. On a filtered probability space (Q,T,¥:= (JiJ^o.P), an adapted sequence 
of real random variables M = (M t )J =0 is a (P, F) -martingale if and only if for any predictable 
process £ = (Ct)f=i, we have 



E £&(M 8 -M 8 _i) =0 , t = l,...,T. 



S=l 
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5 Equivalent measures 

Definition 5.1 (Absolutely continuous measures). Let P and Q be two probability measures 
on a measurable space (f^T 7 ). Assume that for every A £ J 7 satisfying F(A) = 0, we also 
have Q(A) = 0. Then we say Q is absolutely continuous with respect to P, written Q <C P. 

Here is a deep theorem, which we do not prove. 

Theorem 5.2 (Radon-Nikodym) . Let P and Q be two probability measures on a measurable 
space (17, F), such that Q is absolutely continuous with respect to P. Under this assumption, 
there is a nonnegative random variable Z such that 

ZdP, V4G J, (5.1) 

A 

and Z is called the Radon-Nikodym derivative ofQ with respect to P. 
The random variable Z is often written as 

dP" 

Equation (5.1) implies the apparently stronger condition 

E Q [X] = E[XZ] 

for every random variable X for which E[XZ] < oo. To see this, note that (5.1) in Theorem 
5.2 is equivalent to 

E Q [1 A ]=E[1 A Z], AeJ. 

This is then extended this to general X via an argument that is called the "standard machine" 
in Section 1.5 of Shreve [13] (see the Background Probability notes for examples of this). 

Definition 5.3 (Equivalent measures). If Q is absolutely continuous with respect to P and P 
is absolutely continuous with respect to Q, then we say that P and Q are equivalent, written 
Q ~ P. 

In other words, P and Q are equivalent if and only if 

F(A) = exactly when Q(A) = 0, G F. 

If P and Q are equivalent and Z is the Radon-Nikodym derivative of Q w.r.t. P, then is 
the Radon-Nikodym derivative of P w.r.t. Q, i.e. 

E q [X] = E[XZ] yX, (5.2) 



E[Y] = 



Y -z 



Vy, (5.3) 



and letting X and Y be related by Y = XZ we see that the above two equations are the 
same. 
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Example 5.4 (Radon-Nikodym theorem in 2-period coin toss space). Let Q, = SI2 given by 

n 2 = {HH, HT, TH, TT}, 

the set of coin toss sequences of length 2. Let P correspond to probability | for H and | 
for T, and let Q correspond to probability \ for H and \ for T. Then the Radon-Nikodym 
derivative of Q w.r.t. P is easily seen to be 

1 ; P(w) 



so that 



Z(HH) = ^, Z(HT) = ^, Z(TH) = ?, Z(TT) = ^. 



6 The binomial stock price process 

On the n-period coin toss space f2 n , with T = {0, 1, . . . , n}, for i G T, denote by Ft the 
(j-algebra generated by the first t coin tosses. Then, Ft is a collection of subsets A CZ Q — 17 n 
such that is a a-algebra, and such that if one has the information of the results of the first 
t coin tosses (but is not told the outcome oj of all n coin tosses), then one can say whether 
oj G A or oj ^ A, for each A G J 7 *. Then F := (Ft)teJ i s a filtration which records how 
information unfolds as one observes the results of successive coin tosses, as we will see in an 
example shortly. 

The binomial model contains two assets, a riskless asset (or cash account, or money 
market account, or bond), with price process S° = (Sf)teT, and a risky asset or stock, with 
price process S = (St)teT- 

The process S* evolves according to 

S? +1 = (l + r)S°, t = 0,l,...,n-l, Sg = l, 

where r > is the one-period (assumed constant) interest rate. Hence we have 

S° = (l + r) t , i = 0,l,...,n, (6.1) 

and S® represents the value of at time t of one unit of currency (say $1) invested at time 
zero. 

Regarding the stock price process S = {St)teT, for each t G T, St = St(u) (for oj G Q) 
is a one- dimensional random variable on the measurable space (Q,F), such that St(oj) = 
St (oj\ ...ojt) is the stock price after t coin tosses. The sequence of random variables S = 
(St)teT is a stochastic process. We shall see that for each t G T, St is J^-measurable, so that 
S is an F-adapted process. This encapsulates the idea that the information at time t G T, 
represented by Ft, is sufficient information to know the values of S s for all s < t. 

Define two constants u, d satisfying u>l + r>d>0. The evolution of the stock price 
is given by (see Figure 6) 

S t+1 H = | ^ ifwm = T) t = 0,l,...,n-l. 
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Figure 6: Binomial process for stock price. We have associated a probability p G (0, 1) with 
an upward stock price move. 



We also write 

S t+ i(wi...wt+i) = ^ _ i = 0,l,...,n- 1, 

[ £>t(u;i . . .wtja, ita;t+i = l, 

whenever we wish to emphasise that St actually depends only on the outcome of the first t 
coin tosses, and we abbreviate this notation further by sometimes suppressing the depndence 
on wi...W(, and writing 

At time t G T the possible stock prices are S^\ given by 

= Sou j (?- j , j = 0, 1, . . . , t, t € T. 

Example 6.1 (One-period binomial model). Let n = 1, so T = {0, 1}, and Q, = Q\ is the finite 
set 

fil :={H,T}, 

the set of outcomes of a single coin toss. The stock price process is (St) t e{o,i}i an d Si(u>) 
takes on two possible values, Si(H) or S'i(T), given by 



Sou, if uj = H 
S'od, if u = T. 



6.1 Information and conditional expectation in the binomial model 

A filtration F = (J^gT captures information flow as time marches forward. We illustrate 
this, and the associated idea that the binomial stock price process (StjteJ i s adapted to the 
filtration ¥ = (Tt)teJ generated by the coin tosses,, along with some examples of conditional 
expectation, in a 3-period model. 

Example 6.2 (3-period binomial model). Let n = 3, so that f2 = f^, given by the finite set 



n = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}, 
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the set of all possible outcomes of three coin tosses. 

We can write down all the stock prices in a binomial tree as: 




Define the following two subsets of f2: 



A R = {HHH, HHT, HTH, HTT}, A T = {THH, THT, TTH, TTT}, 

corresponding to the events that the first coin toss results in H and T respectively. 

Let the coin have probability p € (0, 1) for H and q := 1 — p for T. Using (2.2) the 
definition F(A) := E wG A IP ( a; ) we find 

P(A H ) = P{HHH, HHT, HTH, HTT} = P{H on first toss} = p, 
F(A T ) = P{THH, THT, TTH, TTT} = P{T on first toss} = q, 

precisely in accordance with intuition. 

Here are two cr-algebras of subsets of f2. 

Jo = {0,^}, Fi = {®,n,A H ,A T }. 

It is easy to see (exercise, or see Problem Sheet 0) that Fo,F\ are both cr-algebras. 

The cr-algebra F\ contains the "information of the first toss" or the "information up to 
time 1". If one has infornation on the first toss only, then one cannot say what the actual 
outcome uj = UJ1UJ2UJ3 of all three coin tosses is. With information up to time 1, all one knows 
is that either uj\ = H or that oji = T. In this case one can answer the question "is uj 6 AT 
for every set in T\. One cannot answer a question such as: "is uj G {HHH, HHT}?" (one 
would need to know the outcome of the first two tosses to answer such a question) . This is 
why Ji = {0,A H ,A T ,S!}. 

The general principle is: 

Fact 6.3. The cr-algebra Ft corresponding to information at time t G T is composed of all 
the sets A such that Ft is indeed a cr-algebra, and such that one can answer the question: 
"is uj £ A?" , given that one has information on the outcome of the first t coin tosses. 
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The trivial cr-algebra J-q contains no information: knowing whether the outcome ui of the 
three tosses is in (it is not) and whether it is in Q (it is) tells you nothing about u, in 
accordance with the idea that at time zero one know nothing about the eventual outcome uj 
of the three coin tosses. All one can say is that uj £ and weO, and so To = {$, fi}. 

Define 

A uu = {HHH, HHT}, A UT = {HTH, HTT}, 
A TH = {THH, THT}, A TT = {TTH, TTT}, 

corresponding to the events that the first two coin tosses result in HH, HT, TH and TT 
respectively. Consider the collection of sets 

Ti = {0, ^, ^4hH) ^HT; ^4th> Alt > phis all unions of these}. 

Then T2 can be written as follows (this is confirmed in Problem Sheet 0): 

F 2 = {®,n,A HH ,A HT ,A TH ,A TT ,A H ,A T , 

^4hh U Ath, ^4hh U Att, ^ht U ^4th, ^4ht U Att, 

^HHi ^HT) ^T H , ^Tt}- 

Then Ti is indeed a cr-algebra (a tedious and lengthy verification) which contains the "in- 
formation of the first two tosses" or the "information up to time 2". This is because, if you 
know the outcome of the first two coin tosses, you can say whether the outcome w£Qof all 
three tosses satisfies ui G A for each A G J~2- 

Similarly, J-3 = J 7 , the set of all subsets of Q, contains "full information" about the 
outcome of all three tosses. The sequence of cr-algebras F = {F$, T\, F2, F3} is a filtration. 

The stock price process {St)tef is F-adapted. That is, the value of the random variable 
St is known after t coin tosses, equivalently, St is Jj-measurable, for each t G T, meaning 
that the event 

{uj G n : S t (u) G A C R} = {S t G A}, 

is in Ft- 

At time zero, with = {£1, 0}, we must have that S$ is a deterministic constant, 

S (uj) = a G R, yui G Q, 

so that sets of the form {So eiCR) are clearly either in Q, or 0, so Sq is T'o-measurable. 
The random variable S\ must be of the form 



Si(uj) 



a G R, if uj G A h , 
b G R, if u G A T , 



since the only information available at time 1 is whether ui\ = H or uj\ = T, and we notice 
that 5*1 is indeed of the above form. 

Continuing to argue in this fashion, we find that at each time t G T, the event 

{uj G Q : S t {uj) G A C R} = {S t G A}, 



is in Ft- The stochastic process S = (St)tei 1S sa id to be adaptedto the filtration F = {Ft)t^j, 
as in Definition 2.8. 
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Now suppose So = 4, u = 2 and d = Then So, Si, S2 and S3 are all random variables, 
denoted by St(uj) for t = 0, 1, 2, 3 and w G 0. We may calculate the values of £2(0;) f° r au 
to G 17, as 

S 2 (HHH) = S 2 (HHT) = 16, 

S 2 (HTH) = S 2 (HTT) = S 2 (THH) = S 2 (THT) = 4, 
S 2 (TTH) = S 2 (TTT) = 1. 

Now consider the preimage under the random variable S 2 of certain sets in R. Specifically, 
consider the interval [4, 29]. The preimage under S 2 of this interval is 

{uj G n : S 2 (oj) G [4,29]}. 

We can characterise the above subset of in terms of one of the sets given earlier in the list 
of sets in T 2 . We have: {uj £ Q : S 2 (uj) G [4, 29]} = A^ T . 

Suppose we list, in as minimal a fashion as possible, the subsets of O that we can get 
as preimages under S 2 of sets in M, along with sets which can be built by taking unions of 
these, then this collection of sets turns out to be a a-algebra, the a-algebra generated by the 
random variable S 2 , denoted <r(S 2 ). 

Now, if u G ^4hh, then S 2 (uj) = 16. If uj G ^4ht U vIth, then S 2 (uj) = 4. If uj G ^tt, 
then S 2 (uj) = 1. Hence c(S 2 ) is composed of {0, f2, Ahh> ^4rt U ^THi -<4tt}j plus all relevant 
unions and complements. Using the identities 

AhhU^htUAth) = A^ T , 

A U rUA TT = (i H TUi T H) C , 
(i H TUi T H)Ui T T = A^ K , 

we obtain 

a(S 2 ) = {0, n, A HH , A U t U ^th, A tt , A rh U A tt , A c uh , A^ t }. (6.2) 

The information content of the c-algebra <t(S 2 ) is exactly the information learned by observ- 
ing S 2 . So, suppose the coin is tossed three times and you do not know the outcome u, but 
you are told, for each set in <t(S 2 ), whether oj is in the set. For instance, you might be told 
that uj is not in ^hh, is in Art U vIth, and is not in Att- Then you know that in the first 
two throws there was a head and a tail, but you are not told in which order they occurred. 
This is the same information you would have got by being told that the value of S 2 (w) is 4. 

Note that T 2 contains all the sets which are in <r(S 2 ), and even more. In other words, 
the information in the first two throws is greater than the information in S 2 . In particular, 
if you see the first two tosses you can distinguish ^4ht from ^4th, but you cannot make this 
distinction from knowing the value of S 2 alone. 

Let us give an example of a conditional expectation: suppose we wish to estimate Si, 
given S 2 , and denote this estimate by E[Si|S 2 ]. This should have two properties: (i) it should 
be a random variable, so should depend on uj, E[Si|S 2 ] = E[Si|S 2 (w)] = E[Si|S 2 ](u;), and 
(ii) it should be o"(S 2 ) -measurable, that is, if f the value of S 2 is known then the value of 
E[Si|S 2 ] should also be known. 
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In particular, if u = HHH or ui = HHT, then S 2 (u) = u 2 S and, even without knowing 
co, we know that Si(u) = uSq. We define 



In other words 
Similarly we define 
In other words 



E[Si|5 2 ](HHH) := E[5i|5 2 ](HHT) = uS . 

E[S 1 \S 2 ](uj) = uS , to £ Ahr. 
E[5i|5 2 ](TTT) := E[5i|5 2 ](TTH) = dS . 
E[5i|S 2 ](w) = dS , uj£A tt . 



Finally, if u £ A = A UT U A TU = {HTH, HTT, THH, THT}, then S 2 (u>) = udS , so that 
Si(u) = uSo or Si(lo) = dSo. So, to get ^[5i|5 2 ] in this case, we take a weighted average, 
as follows. For u G A we define 

«[*!*]<») - 

which is a partial average of S\ over the set A, normalised by the probability of A. 
Now, F(A) = 2pq and f A Si dP = pq(u + d)S , so that for uj G A 

E[Si|S 2 ](w) = + d)S , wG^ = 4tU A th . 

(In other words, the best estimate of Si, given that 5 2 = udSo, is the average of the possi- 
bilities uSo and dSo.) Then we have that 

/ E[5i|5 2 ]dP= / 5idP. 

J A J A 

In conclusion, we can write 

E[5i|5 2 ](w) =g(S 2 (u)), 

where 

{uS'o, if x = i^So 

+ d)So, if x = udSo 
dSo, if x = d 2 So. 

In other words E(5i|5 2 ) is random only through dependence on S 2 (and hence is a(S 2 )- 
measurable). This random variable satisfies 

1. E[Si|S 2 ] is a(S 2 ) -measurable 

2. For every A £ <r(S 2 ), 

[ E[S 1 \S 2 ]dF= [ SidP, 

J A J A 

which is the partial averaging property. 
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Here is another (simpler) example of conditional expectation in the 3-period binomial 
model. Recall the c-algebra determined by the first toss, T\ = {0, ft, Ah, At}, where Ah 
(respectively At) is the event corresponding to a H (respectively a T) on the first toss. 

Using the partial averaging property on the sets Ah and At, we can show (the obvious 
fact) that 

E[S 2 |Ji](w) = {pu + qd)S l {u), 

as follows: E^l^i] is constant on Ah and on At (since it is J-i-measurable) and must satisfy 
the partial averaging property on these sets: 

f E[5 2 |J"i]dP= f S 2 dF, [ E[5 2 |J"i]dP= f S 2 dF. 

(Obviously the partial averaging property is true on (all are zero) and it will be true on fl 
if it is true on An and At since An U At = ft). On Ah we have 

/ E[5 2 |Ji]dP = P(A H )E[S' 2 |J r i](w) (since E[S 2 1 Fi] is constant over A H ) 
JAh 

= pE[S 2 |Ji], V W Gi H , 

whilst on the other hand 

/ S 2 dP = pVSb + pqudSo . 
J An 

Hence 

E[S' 2 |J r i](a;) = pu 2 So + qudSo = (pu + qd)uSo = (pu + qd)S\{oj), Vw G Ah- 
Similarly, we can show that 

E[S 2 |7i](w) = (pu + qd)Si(u), G A T . 

So overall we get 

E[S 2 \Fi](u) = (pu + ?d)Si(w), G 0. 
With = {0, 0}, we can show similarly that 

E [Si | Jo] = (pu + qd)S . 

contains no information, so any J^-measurable random variable must be constant (non- 
random). Therefore E[5i|^b] is that constant which satisfies the averaging property 

f E[5i|jr ]dP= f S 1 dF = E[S 1 } = (pu + qd)S , 
Jq Jn 

and so we have 

E [Si | To] = (pu + qd)S . 
We can generalise the above results to an n-period model. 



7 DERIVATIVE VALUATION IN THE BINOMIAL MODEL 



29 



Lemma 6.4. In an n-period binomial model, we have 

E[S t+1 \T t ] = (pu + qd)S t , i = 0,l,...,n-l. (6.3) 
Proof. To show this, define, for any t = 0, 1, . . . , n — 1, the random variable 

v _ St+i 

Then X = u if uot+i = H and X = d if wt+i = T, and X is independent of Ft because each 
coin toss is independent. Hence 

E[S t+1 \F t ] = E[XS t \F t ] = S t E[X\F t ] = S t E[X] = (pu + qd)S t . 

□ 

Notice that the right-hand-side of (6.3) depends only on the current stock price St, sig- 
nifying that the stock price process is a Markov process. 

7 Derivative valuation in the binomial model 

Take the standard n-period binomial stock price process S of Section 6 on the filtered 
probability space (Q,F,¥ := (J r t ) t6 f,P), generated by n coin tosses, with time index set 
T = {0, 1, 2, . . . n}. The sample space is finite, the probability measure P is called the 
physical measure (or objective measure, or the market measure). We assume F n = F and 
F = {®,n}. 

The sample space is the set of all outcomes of n coin tosses, so each oj G VL is of the 
form oj = (uj\uj2 ■ ■ ■ oj n ), with each oj t £ {H, T}, for each t G {1, . . . , n}. The evolution of the 
stock price process S = (St)™ =0 is given by 



H+i 



S t u, if ojt+i = H, 

S t d, if^ +1 = T, t-U' 1 '---." x ' 



where u>l + r>d>0. 

Introduce a financial agent with initial wealth Xq at time zero, who can choose at each 
time how to split his wealth between the riskless and riskyt assets. The agent's trading 
strategy is the two-dimensional stochastic process 

(vr°,vri), t€{l,...,n}, 

where, for t G {1, . . . , n}, 7if denotes the number of units of the riskless asset held over the 
interval [t — l,t) and 7Tt, denotes the number of units of the stock held over the interval 
[t — 1, t). The positions in the portfolio at time t, for t G {1, . . . , n}, are decided at time t — 1 
and kept until time i, when new asset price quotations are available. 

Assumption 7.1. The portfolio process ,^t)te{i, ...,n) ls predictable, so that for each t G 
{!,... ,n}, 7if and 7Tt are J^-i-measurable. 
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The initial wealth is given by 

Xq = tti + ttiSq. (budget constraint) (7.1) 

Equation (7.1) is a budget constraint, that the agent splits all his initial wealth between cash 
and the risky stock. 
The time 1 wealth is 

Xi = (l + r)7r? + 7nSi, (7.2) 

where we have assumed that no wealth has been taken out of the portfolio for (say) con- 
sumption and no outside income has been injected into the portfolio. Equation (7.2) is thus 
one form of a self-financing condition on the portfolio wealth evolution. Using the budget 
constraint (7.1) we re-cast (7.2) into the form 

X 1 = (l + r)X + ir 1 (S 1 -(l + r)S ). (7.3) 

Similar self-financing portfolio rebalancing occurs at each time t €. {1, . . . ,n — 1}. Define the 
wealth process X = (Xt)t£j, where X± denotes the wealth at time t (that is, at the end of the 
interval [t — 1, t) and the beginning of the interval [t, t + 1)), for t = 0, 1, . . . , n — 1, with X n 
the final wealth at the end of the interval [n — l,n]. We then have the following evolution. 

At the beginning of the interval [t — 1, t), and just after portfolio rebalancing has taken 
place, the wealth is Xt-i given by 

X t _! = TT^S + TTfSf—l = 7T t °(l + r)'" 1 + 7T t 5 t _l, (7.4) 

where the last equality follows from the expression (6.1) for the value of the riskless asset at 
any time in T. The position (-nf , ir t ) is held over [t — 1, t), and the wealth X t achieved at the 
end of this interval (and hence at the start of the interval [t, t + 1)) is 

X t = 7T?S? + TT t S t = 7T t °(l + rf + ir t S t . (7.5) 

At this time, t, the portfolio is rebalanced to (7if +1 , nt+i) so that X t is also given by 

Xt = Kt +1 St + TT t+ iS t . 

Hence the general self-financing condition is 

TT° +1 S? + lT t+1 S t = 7T°S° + 7T t S t , t = 1, . . . , U - 1. 

We can raise this to a definition. 

Definition 7.2. A trading strategy (7if , TTt)t=i * s self-financing if for every t = 1, . . . , n — 1, 
we have 

7r t ° +1 5° + Tr t+1 S t = ^S? + 7T t S t , t = 1, . . . , n - 1. 
Using (7.4) to eliminate 7if from (7.5) we can write the portfolio wealth evolution as 

X t = (l + r)X t - 1 +i rt (St-(l + r)S t - 1 ), t = l,...,n. (7.6) 
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This can be put into a neater form if we work with discounted quantities, that is, we evaluate 
all quantities in units of the bond price. The discounted stock price process S is defined by 

Q — ^* - + — n i 

^-sf-(TT7r f - u > 1 >---> n > 

and the discounted wealth process is similarly defined by 

Xt -sf-JiT^y> t-o,i,...,n. 

Then, in terms of discounted quantities, the wealth evolution equation (7.6) becomes 

X t = X t -i+n(St-St-i), t = l,...,n. (7.7) 

Iterating this evolution from time zero to t £ T we obtain 

t 

X t = X + Y, k s (S s - 5 a _i), t = 1 . . . , n. (7.8) 

s=l 

Prom this we see that the wealth process is completely specified by the initial wealth Xq and 
the choice of stock portfolio tt. When we need to emphasise the dependence of wealth on the 
chosen portfolio we write X(tt) = X. 

The sum in (7.8) as the (discrete-time) stochastic integral of tt with respect to S, denoted 
by(Tr-S): 

t 

(tt- S) t := ^2tt s (S s - S s -i), t = l,...,n. 

s=l 

7.1 Equivalent martingale measures and no arbitrage 

Definition 7.3 (Equivalent martingale measure). An equivalent martingale measure (EMM), 
also called a risk-neutral measure, is a probability measure Q ~ P such that the discounted 
stock price S is a Q-martingale. 

Lemma 7.4. // a martingale measure exists, then the discounted wealth process of a self- 
financing portfolio process tt is a Q-martingale. 

Proof. If a martingale measure Q exists, we have E^[S , t |J r t _i] = St-i- Then, from (7.7) we 
obtain 

[X t |J" t _i] = X t -\, t = l,...,n, 



so that the discounted wealth process is also a Q-martingale. 

□ 

Remark 7.5. Lemma 7.4 also follows from the fact that the discounted wealth process is 
a finite sum of stochastic integrals, and hence a martingale transform (recall Proposition 
4.5). For any self-financing strategy tt, the discounted wealth process is given by (7.8), and 
combining this with Proposition 4.5, the discounted wealth process X is a Q-martingale. 
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7.1.1 The risk- neutral measure in the binomial model 

In the binomial model, there is a unique EMM Q defined as follows. For t G {1, . . . , n}, 
define 

Q(wt = H) = P ® := 1±^, Q(uh = T) = ,Q := " = (1 ± $ . (7.9) 
w — a u — a 

It is clear that Q ~ P, and that E Q [S t+1 \ ? t \ = (l+r)S t , so that Q is indeed an EMM. We will 
see shortly that Q emerges naturally when we try to value derivatives in the binomial model 
via a replication argument. This is one manifestation of deep results called the Fundamental 
Theorems of Asset Pricing which hold in great generality (but are a little off-syllabus). We 
mention these theorems in passing. 



7.1.2 Fundamental theorems of asset pricing 

Recall the definition of arbitrage. 

Definition 7.6 (Arbitrage). An arbitrage over T = {0, 1, . . . , n} is a strategy ir such that 
the associated wealth process satisfies Xq = 0, P[X n (7r) > 0] = 1 and F[X n (ir) > 0] > 0. 

Theorem 7.7 (First Fundamental Theorem of Asset Pricing (FTAP I)). A finite sample 
space, discrete-time financial market is arbitrage- free if and only if there exists an equivalent 
martingale measure. 

Remark 7.8. The proof of Theorem 7.7 is easy in one direction: suppose there exists an 
equivalent martingale measure Q. Then for any self-financing strategy it we have, from 
Lemma 7.4, that the discounted wealth process is a Q-martingale, so E^[X„] = Xq. This 
immediately precludes the possibility of arbitrage. For suppose X is such that Xq = 
and X n > P-a.s., so that X n > Q-a.s. (since P and Q are equivalent). But since 
K Q [X n ] = X = 0, it must be the case that X n = 0, Q-a.s. This implies that X n = P-a.s., 
since P and Q are equivalent, so there is no arbitrage. 

Definition 7.9. A European contingent claim with expiration time n is a non-negative im- 
measurable random variable Y, which is called the payoff of the claim. 

Definition 7.10. A European contingent claim Y is said to be attainable (or hedgeable or 
replicable) if there exists a constant Xq and a portfolio process ir = (iTt)f=i such that the 
self-financing wealth process (X t )™ =0 satisfies 

X n (u) = Y(u), G n. 

In this case, for t = 0, . . . , n, we call Vt := X t the no-arbitrage price at time t of Y, and the 
portfolio which attains X n = Y is called the replicating portfolio for the claim. 

Definition 7.11 (Complete market). A financial market is said to be complete if every 
contingent claim is attainable. Otherwise, the market is said to be incomplete. 

Theorem 7.12 (Second Fundamental Theorem of Asset Pricing (FTAP II)). A finite-state 
discrete-time arbitrage- free market is complete if and only if there is a unique equivalent 
martingale measure. 
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Here are some examples of European claims in a discrete-time setting with time index 
set T = {0,1,..., n}. 

• A European call option, with payoff Y = (S n — K) + for fixed strike K > 0. 

• A European put option, with payoff Y = (K — S n ) + for fixed strike K > 0. 

• A fixed strike lookback call option, with payoff Y = (M n — K) + for fixed strike K > 0, 
where M n is the maximum of the stock price over {0,1, ... ,n}, that is M n = max t6 ir St- 

• A floating strike lookback call option, with payoff Y = (S n — m n ) + , where m n is the 
minimum of the stock price over {0,1, ... ,n}, that is m n = min^T St- 

• An arithmetic average fixed strike Asian call option, with payoff Y = (A n — K) + 
for fixed strike K > 0„ where A n is the arithmetic average of the stock price over 
{0, 1, ... , n}, that is 

In a complete market, all contingent claims are attainable. So, given a contingent claim 
Y , there is a unique trading strategy tt with wealth process X = (X t ) te j such that X n = Y 
almost surely. This immediately implies that, to avoid arbitrage, the price of the claim at 
any time t £ T must be Vt := Xf, as in Definition 7.10. 

Denote the discount factor from time t £ T to time zero by Dt- So, with constant interest 
rate r, D t = (1 + r)~* = l/5 t °. 

Lemma 7.13. The no-arbitrage price of an attainable claim Y is given by 

V t = ^-E Q [D n Y\T t ], t G T. (7.10) 
Dt 

Any other price for the claim will lead to an arbitrage opportunity. 

Proof. Let X =J^X t )teT be the wealth process of the replicating strategy. The discounted 
wealth process X = DX is a Q-martingale, so satisfies 

E Q [D n X n \T t ] = D t X t , t<n. 

Using X n = Y and the definition Vt := X t yields (7.10). 

To show that there is arbitrage if (7.10) is violated, consider buying or selling the claim 
at time zero (and a similar argument would hold at any time t € T). 

First, suppose Vq > ¥^{D n Y]. Sell the claim for Vq and use the proceeds to invest in the 
replicating portfolio, which requires an initial investment of Xq = FP[D n Y]. The wealth in 
this portfolio at time n is given by X n = Y, by assumption. Therefore, one can, at time 
zero, invest Vq — Xq > in the bank, use the proceeds from the replicating portfolio to settle 
one's obligations from the claim, and make a riskless profit of (Vo — Xq)(1 + r) n > 0. This is 
an arbitrage. 

Similarly, if Vq < E^[D n Y], one buys the claim and sells the replicating portfolio, leading 
to profits (X - V ){1 + r) n > 0. 

□ 
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7.2 Pricing by replication in the binomial model 

In this section we shall show that in the binomial model it is possible to replicate any 
European claim, so the model is complete. If we were to reply on the first FTAP we could 
deduce this immediately, since we can define a unique EMM Q via (7.9), repeated below: 

Q(a* = H) = p® := 1 ^ d , Q{uh = T) = <fi := " " (1 + ^ , (7.11) 
u — a u — a 

for t G { 1, . . . , n} . We will explicity construct a replicating strategy and see that this measure 
emerges naturally. 

7.2.1 Replication in a one-period binomial model 

First consider a one-period model, n = 1, so T = {0, 1}. Suppose an agent sells a claim on 
the stock at time zero that expires at time 1. There are just two points oj £ Q, given by 
oj = H and oj = T. 

The claim pays off an amount Y at time 1, where Y is an T\ -measurable random variable. 
This measurability condition is relevant; it says that the value of the claim at its maturity 
date is determined by the coin toss, that is, by the value of the stock price at time 1. This is 
why it does not make sense to use some stock unrelated to the derivative security in valuing 
it. 

The agent sells the claim at time zero for some price Vq (to be determined) and attempts 
to manage the risk from this sale by building a hedging portfolio composed of a number 
7Ti = 7r shares of the underlying stock and a number ir® = 7r° of shares of the riskless asset 
(which has initial value S® = 1). 

We suppose that the proceeds from the sale of the claim, Vq, are all that the agent uses 
to construct a hedging portfolio. Therefore the initial wealth in the hedging portfolio is 

X = 7r + 7rS* . (7.12) 

As the stock price evolves in time the hedging portfolio and option value will also evolve. The 
option payoff is the random variable Y(oj) (so for, say, a call option, Y(oj) = (Si(w) — K) + , 
where K is the option's strike and S\{oj) is the stock price after one coin toss). The agent's 
hedge portfolio wealth at time 1 is X\(uj), given by 

Xi(w) = (l + r)vr° + ^SiH. 

Eliminating 7r° using (7.12), we write X\ as 

X 1 (co) = (1 + r)X + tt(Si(w) - (1 + r)So). 

If the hedging portfolio is to successfully manage the risk from the option sale its value must 
replicate the option payoff in each possible final state, so that we require X\{lo) = Y{u) for 
oj = H and oj = T, yielding the equations 



(l + r)X + 7r(5i(H)-(l + r)5 ) = y(H), if u = H, (7.13) 
(l + r)X + 7r(5i(T)-(l + r)5 ) = Y(T), ifw = T. (7.14) 
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Solving these equations for ir gives 

Y(H) - Y(T) 



TT 



Si(H)-5i(T)' 

Then, the initial wealth is computed from either of (7.13) or (7.14) as 

1 



X n = 



1 + r 



p y y(H) + g«y(T) , (7.15) 



where we have used (7.11) and <Si(H) = uSo, S±(T) = dSo. (The cash position required can 
be obtained using (7.12).) 

If the agent holds the portfolio (7r°,7r), then he will be able to meet all his obligations 
associated with the claim. Therefore the current claim price is the initial wealth required to 
do this, or Vq = Xq, as given by (7.15). So we get the claim price at time zero as 



V n 



The measure Q is the unique EMM for this one-period market, and is also known as the 
risk-neutral probability measure. 

It is clear that Q is equivalent to the physical measure P, and that Q is indeed a martingale 
measure, in that 

Si 



1 + r 



= S . 



It is also clear that the discounted wealth process, and hence the discounted claim price 
process, is also a Q-martingale, just as we would have expected from the FTAPs. 

7.2.2 Generalisation to n-period binomial model 

We can easily generalise the above analysis to an n-period model, by simply concatenating 
a sequence of 1-period models. 

Let us place ourselves at some time t — 1, where t 6 {1, . . . ,n}. Given a fixed outcome 
oj\ . . . ut-i of the first t — 1 coin tosses, suppose that the values of the stock and a derivative 
security at time t are St(tot), Vt(tot) respectively, if the outcome of the t th coin toss is cat (see 
Figure 7). Then one can trade over [t — l,t] to reproduce the values of the derivative one 
period later, as follows. 

At time t — 1, after portfolio rebalancing has taken place, the wealth with strategy 
(7r°,vr) = {ir^iTt)t=i is given by 

X t - 1 = i$$_ 1 +ir t S t - 1 . (7.16) 
This evolves to the wealth X t {oj t ) at time i, given by 



X t (uH) = 7r t °(l + r)^ .! + rrtSt^et, e t 



u, uj t = H, 
d, ut = T. 
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5 t (H) = 5 t _ lU , V t (H) 

S t -u V t -i / 

\s t (T) = S t -id, V t (T) 
Figure 7: Binomial process for stock price and derivative. 

Eliminating 7r°S' t _ 1 using (7.16) we get 

X t (ut) = (1 + r)X t _i + 7r t St-i(et " (1 + r)), w t = H, T. 

Writing this out fully as two equations, we have 

X t (R) = (l + r)X t ^ + Tr t S t ^{u-{l + r)), (7.17) 
X t (T) = (l + r)X t _i-?r t 5 t _i(l + r-d). (7.18) 

We require X t (ujt) = ^4(o; t ), for both uj t = H and u>t = H. This requires that the stock 
holding at the beginning of the interval [t— 1 , i) must be 

= V t (H) - V) t (T) = Vt(H) - V) t (T) 
nt 5t_i(«-d) S t (R)-S t (T)- 

The required wealth at time i — 1 is then given from either of (7.17) or (7.18) as 

= E®[(l + r)- 1 V t \T t -i], t = l,...,n. 

For no-arbitrage, we must then have that the derivaive value at time t — 1 must be given by 
Vi_i=X t _i: 

y t _ 1 = E Q [(l + r)- 1 V t |.F t _i], i = l,...,n. (7.19) 

Notice that this implies that the discounted option value ((1 + r)~*V r i)™ =0 is a Q-martingale 
(as it should be, since it is replicated by a discounted wealth process which is a Q-martingale. 

This shows that one can always find a strategy at any time to reproduce the value of a 
contingent claim one period later. The key to valuing the contingent claium is thus to begin 
at the maturitry time and work backwards, computing risk-neutral discoounted expectations. 
The next section formalises this. 

7.3 Completeness of the multiperiod binomial model 

The above analysis can clearly be iterated so that in a multiperiod binomial model, we can 
replicate any contingent claim. The next theorem rigorously demonstrates that a portfolio 
process to hedge any contingent claim in the binomial model exists, and derives an expression 
for wt, t = 1, . . . , n. 

Define the unique EMM Q by setting the Q-probability of H on each coin toss is to be 
p Q , and the Q-probability of T to be q Q := 1 - p Q , given by (7.11). 



X t - 



t-i 



1 4- r 



p Q V t (B) + q®V t (T) 



7 DERIVATIVE VALUATION IN THE BINOMIAL MODEL 



37 



Theorem 7.14. The n-period binomial model is complete. In particular, let Y be European 
claim with maturity time n, and define 



V t {ui...ut) ■= {l + r) t E^[{l + r)' n Y\Ft\(u 1 ...u t ), t = 0,...,n, 

V t (uJi . . .wt-iH) - V t (ui . . .u t -iT) 
7r t (wi...wt_i) := ^7 =7 ^ =r, t = l,...,n. 

Then, starting with initial wealth Xq :=Vq = E^\(\-\-r)~ n Y\, the self-financing wealth process 
corresponding to the portfolio process tti , . . . , ir n is the process Vo, . . . , V n . 

Proof. Let Vo,...,V n and 7Ti, . . . , 7r n be defined as in the theorem. Observe that V n = Y 
almost surely. 

Start with wealth Xq = Vq = E®[(1 + r)~ n Y] and consider the self-financing value of the 
process m, . . . , ir n . This wealth satisfies the recursive formula 

X t+1 = (l + r)X t + 7T t+1 (S t+1 - (l + r)S t ), t = 0, 1, . . . , n - 1. 

We need to show that, with Xt, Vt, trt defined as above, we have 

X t = V t , almost surely, Vi € {0, . . . , n}. (7.20) 

We proceed by induction. For t = 0, (7.20) holds by definition of Xq. Now assume that 
(7.20) holds for some fixed value of t € {1, . . . , n — 1}, i.e. for each fixed (u>i . . . u t ) we have 

X t {ui . . .u t ) = V t (ui . ..u t ). 

Then we need to show that 

X t+ \{ui . . .u t B) = V t+ i(ui . . .u t B.), 
X t+ i(u 1 ...u t T) = Vt+iiuL-.utT). 

We shall prove the first equality, and note that the second can be proved similarly (an 
exercise). Note first that {(1 + r)~ t Vt}f =0 is a martingale under Q, since 

EQ[(l + r)-(* +1 V t+ i|.F t ] = E®[E%l + r)- n Y\T t+ i]\Ft] (defn. of V t+1 ) 

= E Q [(1 + r)~ n Y\F t ] (tower property) 
= (l + r)-*Vt. 

So in particular, 

VtiuL-.ut) = E Q [(l + r)- 1 V t+1 \Tt}(u 1 ...u t ) 

= T^O^+i^i ■ • -^ H ) + ? Q Vt+i(wi • • -^T)). 

Since (u\ . . . u t ) will be fixed for the rest of the proof, we simplify notation by suppressing 
these symbols. For example, the last equation is written as 

Vt = rT7^+i( H ) + « Q ^+i(T)). (7.21) 
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Now we compute 

*f+i(H) 



= (l + r)X t + w t+1 (S t+1 (K)-(l + r)S t ) 
= (l + r)V t + 7r t+1 (S t+1 (R)-(l + r)St) (since X t = V t ) 
n i u/ i [ ^t+i ( H ) — 14+1 (T) ] 

= p Q V t+1 (R)+q Q V t+1 (T) 
rVt + i(H) - Vi + i(T)] 

= p<V t+ i(H) + q<*V t+1 (T) + </ Q (F m (H) - K t+1 (T)) 
= V t+1 (H), 



where we have used 5t + i(H) = SfU and 5 t+ i(T) = ^cZ. 



□ 



Example 7.15 (European call in 2-period model). Let u = 2, d = 1/u, r = 1/4, So = 4, so 
that p 1 ^ = = 1/2. Consider a European call with expiration time 2 and payoff function 
Y = (S2 — 5) + . The possible stock prices in this model are shown in Figure 8. 

S 2 (HH) = 16 

/ 

Si(H) = 8 
So = 4/ ^> 5 2 (HT) = S 2 (TH) = 4 
5i(T) = 2 

^S 2 (TT) = 1 



Figure 8: Two period binomial lattice 

There are four elements uj G SI = {HH, HT, TH, TT}, so in principle there are four 
possible final stock prices. But in fact, two of the outcomes u> lead to the same stock price. 
We say that the stock price is path-independent since it only depends on the number of H 
and T in the sequence u = (wi,^) (where u>t, t = 1, 2 is either H or T), and does not depend 
on the order in which the H and T occur. Thus S 2 (HT) = S 2 (TH) = 4. for example. The 
terminal option payoffs for each uj £ Q are 

y(HH) = 11, y(HT) = y(HT) = y(TH) = Y(TT) = 0, 

and these are of course the option values at time 2: 

V 2 (HH) = 11, y 2 (HT) = y 2 (HT) = V 2 (TH) = V 2 (TT) = 0. 

Then using the binomial algorithm in Theorem 7.14 we "work backwards in time" using the 
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fact that V is a Q- martingale, to obtain 

Vi(B) = ^-L-CpQ^CHH) + </V 2 (HT)) = | (±(11) + 1(0)) = f , 

Vi(T) = ^(pQy 2 ( TH ) + g«y a (TT)) = |Q(0) + i(0))=0 l 

= ^(pQv l( H) + ^(10) = | 1(0)) =| = 1.76. 

8 American options in the binomial model 

We briefly discuss the pricing of American derivative securities in the binomial model. Amer- 
ican derivative securities can be exercised at any time prior to maturity. 

Definition 8.1. In a discrete-time framework with time set T = {0, 1, . . . , n}, an American 
derivative security with maturity n is a sequence of nonnegative random variables (Yt)f =0 
such that for each t G T, Yt is Ft -measurable. The owner of an American derivative security 
can exercise at any time t G T, and if he does, he receives the payment Y t . 

For example, an American put option of strike K on a stock price S = (Si)™ =0 can be 
exercised at any time t G T to give the owner a payment Y t := (K — St) + , which is called 
the intrinsic value of the option at time t. 

Recall the pricing of European securities. Consider a binomial model with n periods, so 
the time set is T = {0, 1, ... , n}. Suppose Y n is the payoff of a European derivative. For 
t G T, we define by backward recursion 

V n :=Y n , V t := ^[ P Q V t+1 (R) + q Q V t+1 (T)}, t = 0,...,n-l, (8.1) 

where, as before, the second equation is a shorthand for 



Vt(wi...wt) = E Q [(l + r)- 1 V t+1 \T t ](u 1 ...UH) 

Then Vt is the value of the option at time t G T, and the hedging portfolio over [t — l,t) is 
n t given by 

_ _ V t (R)-Vt(T) _ V t (R)-Vt(T) 
nt S t (B)-S t (T) ~ S t ^(u-d) ' i '"-' n ' 
which is shorthand for 

V t (ui...uH-iH)-Vt(ui...wt-iT) . . 

7r t (wi...wt_i) = — — — — , t = l,...,n. 

bt{0Ji . . . ut-iH) - b t {uJi . . . u;t_iT) 

Now suppose the option is American, with payoff Y = (Yt)™ =0 . At any time t G T, the holder 
of the American derivative can exercise the option and receive the payment Yt. Hence, the 
hedging portfolio should create a wealth process X which satisfies 



Xt>Y t , V t G T, almost surely. 
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This is because the value of the American option at time t is at least as much as the so-called 
intrinsic value Y t , and the value of the hedging portfolio at that time must equal the value 
of the option. 

This suggests that, to price an American derivative, we should replace the European 
algorithm (8.1) by to the following American algorithm: 



V n = Y n , V t = max 



Y t ,^[^y m (H) + ^V t+1 (T)] 



i = 0,...,n-l, (8.2) 



which checks whether the intrinsic value is greater than the value of the discounted risk- 
neutral expectation, which would signify that the option would be exercised in that state. 
Then Vt would be the value of the American derivative at time t £ T. 

Remark 8.2 (Super-martingale property of American option price). In valuing European 
options we found that the discounted option value is a Q-martingale (recall, for example, 
(7.19)). From (8.2). We see that the value of the American option can be greater than that 
given by a discounted risk-neutral expectation, because of the possibility of early exercise. 
In other words, we might have that 



V t > E l 

or, equivalently 



V t+ i 



1 + r 



l + r)-(* +1 V t+ i|Ji] < (l + r)-*T4, 

so that the option value is a Q-supermartingale. (It turns out that the value process of an 
American option is the smallest supermartingale that dominates the payoff, though we do 
not prove this here.) 

Example 8.3 (American put in a 2-period model). Consider an American put option in a 
2-period binomial model with u = 2, d = 1/u, r = 1/4, So = 4, so that p Q = q Q = 1/2. Let 
the option have payoff function Y t = (5 — St) + . The possible stock prices in this model are 
shown in Figure 9. The terminal values of the option are given by V2 = Yi = (5 — and 
these are also shown in the figure. 

S 2 (HH) = 16, V 2 (HH) = 
S.m) = 8 

S Q = a/ \ 5 2 (HT) = S 2 (TH) = 4, V 2 (HT) = V 2 (TH) = 1 
Si(T)=2 

\s 2 (TT) = 1, V 2 (TT) =4 



Figure 9: Stock price and terminal value of American put 
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Then the values of the option at time 1 are: 
Vi(H) 
Vi(T) 





(5 - sr 


- t 






max 


'5 




o + i-i) 




(5 - 2Y 


- i 






max 


'5 


3 


1 + B] 



max 



0. 



max[3, 2] = 3. 



In particular, we notice that at time 1, and for oo\ = T, the option should be exercised, as 
the intinsic value is greater than the discounted risk-neutral expectation of later values. 
The option value at time zero is 



Vq = max 



(5-4)" 



4 / 1 



5 V 2 



2\ I, N 
5 + 2 <3 > 



max 



34 
25 



34 
25 



1.36. 



Now let us attempt to construct the hedging portfolio for this option. We begin with initial 
wealth Xq = 34/25, and we compute 7Ti via the replication condition for ui\ = H: 



X 1 (R) = (1 + r)X + tti(5i(H) - (1 + r)S ) = ^i(H) 



5' 



which yields tt\ = —13/30. We could just as well calculate tt\ by looking at the wealth Xi(T), 
as follows: 

X 1 (T) = (1 + r)X + tti(Si(T) - (1 + r)S ) = Vi(T) = 3, 
which also yields tx\ = —13/30. Now let us try to compute -K2 in a similar manner: 

X 2 (HH) = (1 + r)Xi(H) + 7r 2 (H)(5 2 (HH) - (1 + r)Si(H)) = V 2 (BR) = 0, 

which yields 1T2 (H) = — ^ . The same result is obtained if one considers the wealth X2 (HT) . 
Now let us try to compute 7T 2 (T) as follows: 

X 2 (TH) = (1 + r)Xi(T) + tt 2 (T)(5 2 (TH) - (1 + r)Si(T)) = y 2 (TH) = 1, 

which yields 7r 2 (T) = —11/6. However, if we try to compute 7r 2 (T) using X 2 (TT), we get 

X 2 (TT) = (1 + r)Xi(T) + tt 2 (T)(S 2 (TT) - (1 + r)5i(T)) = V 2 (TT) = 4, 

which yields 7r 2 (T) = — 1/6. In other words, we get different answers for 7r 2 (T), the position 
in stock that should be chosen at the start of the interval [1, 2) when u>\ = T! This apparent 
anomaly has arisen because Xi(T) = 3 (since the American put is exercised when oj\ = T) 
rather than 2, which would be the case if the option were European (and you can check that 
in this case the above calculations would both have yielded 7r 2 (T) = — 1). 

This example showns that we need to analyse the hedging portfolio for an American 
option more closely. 
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8.1 Value of hedging portfolio for an American option 



Consider the following generalisation of the evolution of the wealth of a self-financing port- 
folio, equation (7.6): 



X t = (l + r)(X t _! - + ir t (St - (1 + r)S t _i), t = l,...,n, 



(8.3) 



where, for t G {0, 1, ... ,n — 1}, Cj represents the amount of wealth consumed at time i. 
In other words, we are allowing for some funds to be withdrawn from the self-financing 
portfolio. We found earlier that, for a self-financing portfolio, the discounted wealth process 
((1 +r)~ t Xt)f =0 is a martingale. The consequence of allowing consumption from the portfolio 
will mean that the discounted wealth process will be a supermartingale (i.e. it will tend to 
go down). 

To appreciate why this adjustment might be needed, consider the American algorithm in 
(8.2). We see that the value of the option can be greater than that given by a discounted 
risk-neutral expectation, because of the possibility of early exercise. In other words, we might 
have that 



V > 



1 + r 



V t+ i 



Tt 



or, equivalently 



E Q[(1 + r )-( t+1 V t+ i| Ft] < (1 + r)-V t , 



so that the option value is a supermartingale. (It turns out that the value process of an 
American option is the smallest supermartingale that dominates the payoff, though we do 
not prove this here.) 

To see how consumption enters the hedging portfolio, consider the situation in which 



V t > E l 







1 + r 





(8.4) 



Then the holder of the American option should exercise (this is the case in the state u± = T 
in Example 8.3), so that hedging should stop at this point (which is why we had difficulty 
isolating what the hedging portfolio should be in the example). If the holder of the option 
does not exercise, then the seller of the option may consume to close the gap between the 
left and right hand sides of (8.4). By doing this, he can ensure that X t = Vt for all t £ T, 
where Vt is the value defined by the American algorithm. 

In Example 8.3, we had Vi(T) = 3, F 2 (TH) = 1, V 2 (TT) = 4, so that 



1 



l + r 



-V 2 



(T) 



1.1 + 1.4 

2 2 



and there is a gap if size 1 in (8.4). If the owner of the option does not exercise it at time 1 
in the state oj± = T, then the seller can consume an amount 1 at time 1. Thereafter he uses 
the usual hedging portfolio 

Vt(H) - V t (T) 



(u ~ d)S t - 
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In the example, we had Vi(T) = Y"i(T), which means that, acting optimally, the holder of the 
option should exercise. It turns out that it is optimal for the owner of the American option 
to exercise whenever its value Vt agrees with its intrinsic value Yt. 

Part III 

Continuous time 

9 Brownian motion 

9.1 Random walk 

Toss a coin infinitely many times, so that the sample space is the set of all infinite sequences 
uj = {lo\lo2 . . .) of H and T. One can construct a well-defined probability space (O, T , P) called 
the space of infinite coin tosses (though this is not completely trivial, as is an uncountably 
infinite space), as well as a filtration (T)t>o on this space. We do not have time to delve 
into the construction of infinite coin toss space here. Chapters 1 and 2 of Shreve [13] has a 
detailed account. 

Assume that each toss is independent, and that on each toss the probability of H is p, so 
that the probability of T is q := 1 — p. Define 

The random variable Yj, which always takes one of two values, is sometimes called a Bernoulli 
random variable. 

Define a process M = (Mk) < ^ =0 by 

M := 0, 

k 

M k := ^Yj, A; = 1,2, (9.2) 

i=i 

The process (Mfc)£L is called a random walk. It is the sum of independent, identically 
distributed (i.i.d.) Bernoulli variables, and is sometimes called a binomial random variable. 

Remark 9.1 (Symmetric random walk). For a = 1,(3 = —l,p = q = \, the process {Mk)^L 
is a symmetric random walk, whose analogue in continuous time is Brownian motion, as we 
shall see. 

9.2 BM as scaled limit of symmetric random walk 

On the infinite coin toss space (fi, T , P) , define the random variables 
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with W{uij = H} = W{ojj = T} = |, so that each Xj has mean zero and variance 1, (i.e. 
E[Xj|J 7 j_i] = and E[X?|J-j_i] = 1). So Xi,X%, ... is a sequence of independent, identically 
distributed random variables. Then define the symmetric random walk M via 

M := 0, 

k 

M k := fc = l,2,.... 

By the Law of Large Numbers we know that 

yMk — > 0, almost surely, as k — > oo. 



By the Central Limit Theorem we know that for large fe, M k /yk is approximately standard 
normal: ^ 

—=M k — >■ Z ~ -/V(0, 1), almost surely, as k — > oo. 
v k 

Brownian motion arises if we suitably speed up the tossing of the coins and scale the 
size of each random walk increment. To this end, if t > is of the form t = k/n =: k6t (so 
St = 1/n is the time betwen coin tosses) for positive integers k,n, then define a continuous 
time process via 

W ( t n) := ^-M nt = ^-M k = Vd~tM t/5u t > 0, 



with linear interpolation used to define for any times t > not of the form k/n. Take 

the limit k — > oo, with t fixed. 3 Then since — > Z ~ N(0, 1) as k — > oo, we have that 

W { t n) ^W t ~ N(0, t), as n -»• oo, 

and we call the process (T4 / t)i>o a standard Brownian motion. 

Notice that with t = kSt, we have (though this is purely formal) 

d^'_, im <1,-^-^ iim , 



dt st->o 5t st->o y/M' 



If, instead of , we were to define 



V t (n) := -M nt = l -M k = StM t/St , 



flit i 

n k 

then by the Law of Large Numbers, 

r(n) 



V; ' -> 0, as k -»• oo, 



3 Equivalently, n — > oo, or equivalently, St —¥ 0, where St := 1/n is the time interval between coin tosses. 
Then k = nt — t/St, and hence t = kSt, so we are speeding up the coin tossing, and since t/k — St, 



W\ n) = VStM t/st = V5tM k , so that we are scaling each increment of the random walk by v St 
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and 



(n) 



(tj) = lim 




(n) 



lim X k+ i = ±1, 



dt 



St 



so while the derivative of is defined (unlike that of W^), the process is trivially 
zero in the limit. 

In other words the Brownian "particle" can only have motion if it has infinite velocity. 
This is a manifestation of the fact that paths of Wt are almost surely continuous but not 
differentiable, as we will see again in a short while. 

Remark 9.2 (Random walks and the binomial model). In the binomial model, the logarithm 
of the stock price process follows a random walk. A similar analysis as above can be used to 
show that the continuous time limit of a binomial model has stock price process given by 



where where b and a > are constants related to the binomial parameters u, d and to the 
probability p of the stock price rising in the binomial tree, by 



The process (9.3) is known as geometric Brownian motion. 
9.3 Brownian motion 

We shall see that Brownian motion (BM) is a continuous stochastic process which is Markov, 
Gaussian, and a martingale. 

Let X ~ N(fi, a 2 ) denote that a random variable X is normally distributed with mean \i 
and variance a 2 . 

Definition 9.3 (Brownian motion). A standard 1-dimensional Brownian motion (BM) is a 
continuous adapted process W := (Wt, Ft)o<t«x> on some filtered probability space (fl, T , F := 
(Ft)t>o, P) with the properties that Wq = a.s. and, for < s < t, Wt — W s is independent 
of F s and normally distributed as Wt — W s ~ N(0, t — s). 

The filtration (Tt)t>o is a part of the definition of BM. However, if we are given (Wt)t>o 
but no filtration, and if we know that W has stationary, independent increments and that 
Wt = Wt — Wo ~ N(0, t), then with (J 7 t w/ )t>o being the filtration generated by the BM, we 
have [Wt, J r t V )o<t<oo is a BM in the sense of Definition 9.3 (see ([9], Problem 1.4). 

Here is another definition BM, based on a process called quadratic variation , one defini- 
tion of which is given below. Let M.2 denote the space of right-continuous square-integrable 
martingales on a complete filtered probability space (ft, J 7 , F := (J 7 t)t>o,F): that is, for 
M := (M t ) t >o e M 2 we have M = a.s., and E[M t 2 ] < oo, for all t > 0. 

Definition 9.4 (Quadratic variation). For M 6 M-2, the quadratic variation (QV) of M is 
the unique, increasing adapted process [M] such that [M]q = a.s. and such that (M 2 — 
[M] t )t>o is a martingale. 




(9.3) 
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Definition 9.5 (Cross- variation). For X, Y G M.2, define their cross-variation process 
([X,Y} t ) t > by 

[X,Y] t :=l([X + Y] t -[X-Y] t ), 

For X, Y G A^2 (i- e - continuous), this is the unique increasing adapted process [X, Y] such 
that [X, Y]o = a.s. and such that ((XY — [X,Y])t) t>0 is a martingale. 

Remark 9.6. For Brownian motion W := {Wt)t>o, we have [W]t = i, since — t is a 
martingale (see Problem Sheet 2). Indeed, Brownian motion may be defined as the unique 
continuous process that satisfies this property. 

We denote by (T)t>o the filtration generated by Brownian motion. Its required properties 

are: 

• For each t, Wt is J^-measurable; 

• for each t and for t < t\ < ti < . . . < t n , the Brownian motion increments Wt 1 — 
W t ,W t2 -W tl ,..., W tn - W tn _ x are independent of T t . 

Here is one way to construct Tt- First fix t. Let s G [0, t] and A G B(R) be given. Put 
the set 

{W s G A} = {co : W s {uj) G A} 

in Tt- Do this for all possible numbers s G [0,t] and all Borel sets A G B(R). Then put in 
every other set required by the cr-algebra properties. This cr-algebra T contains exactly the 
information learned by observing the Brownian motion up to time t, and (Tt)t>o is called 
the filtration generated by the Brownian motion. 

9.4 Properties of BM 

Stationarity We say a stochastic process X = (Xt)t>o is stationary if Xt has the same 
distribution as X t +h for any h > 0. Brownian motion has stationary increments. To see 
this, define the increment process / = (It)t>o by It ■= W t +h — Wt- Then I t ~ A^O, h), and 
h+h = Wt+2h — Wt+h ~ -W(0> h) have the same distribution. This is equivalent to saying that 
the process (W t +h — Wt)h>o has the same distribution for all t. 

Martingale property The independent increments property allows us to shw that BM is 
a martingale. For < s < t we have 

E[W t \T s ] = E[W t -W s + W S \T S ] = E[W t - W S \T S ] + W S = E[W t - W s ] + W S = W s . 

Covariance of BM at different times Let < s < t be given. Then W s and Wt — W s 
are independent, and (W s , W t ) are jointly normal with E[W S ] = E[W t ] = E[W t - W s ] = 0, 
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var(W s ) = s, vav(Wt) = t, vav(Wt — W s ) = t — s, so that the covariance of W s and Wt is 

cov(W s ,W t ) := E[(W s -E[W s ])(W t -E[W t })] 
= E[W s W t ] 

= E[W s (W t - W s + W a )] 

= E[W s (W t -W s )]+E[W*} 

= E[W s ]E[W t - W s ] + s (by independence) 

= s. 

Thus, for any s > 0, t > (not necessarily s < t), we have 

cov(W s , W t ) = E[W s W t ] = s A t = mm(s,t), 
or, equivalently, the covariance matrix of the vector W S) t = (W s , Wt) is C = A^ 1 , given by 

C = A -1 = ( S S ^ J , (positive definite, symmetric). 

Definition 9.7 (Transition density). Fix x G R, to G R+- Then 

P(W to+t G [y, y + dy] | Wj = x) = p(t , x, y) dy , 
where the transition density of Brownian motion is the function 

p(t,x,y) = -^==exp^- v ^ y j , y G M, t >0. 

This is the probability density that the BM moves from x to y G R in a time period t. 

Starting points other than zero For a standard Brownian motion VF that starts at zero 
we have a probability space (fi, J 7 , P) that satisfies P{VFo = 0} = 1. Then for £ > 0, Wt ~ 
iV(0, i). For x G R, we can define a process Wf := x + which will satisfy PjWg = x} = 1 
and, for t > 0, Wf ~ N(x,t). 

Equivalently, we can define another probability measure ¥ x (or, more formally, a prob- 
ability space (Q,J r ,P x )) under which P^jWo = x} = 1, and with VF having stationary 
independent increments under P^: for s < t, Wt — W s ~ N(0,t — s) and independent of 
Then, under P x , Wt ~ N(x,t). In this case, we say that If is a Brownian motion starting 
at x. We see that such a Brownian motion is equivalent to x + W, where W is a standard 
Brownian motion starting at zero. 

Note that: 

• If x ^ 0, then P x puts all its probability on a completely different set from P. 

• The distribution of Wt under P x is the same as the distribution of Wf = x + Wt under 
P, that is 

Law(W x ,P) = L&w(W,F x )- 
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Markov property We can show that W is a Markov process as follows. Recall that the 
Markov property is equivalent to stating that for s > 0, t > 0, we have E[/i(W s +t)|.F s ] = 
g(W s ), where h and g are functions. Consider 

E[h(W s+t )\T s ] = E[h(W s+t -W s + W S )\T S }. 

Use the properties that W s+ t — W s is independent of F s , and that W s is ,F s -measurable, along 
with the following independence lemma: if X, Y are random variables on a probability space 
(fi, T, P), and if Q is a sub- a- algebra of F, with 

• X (/-measurable 

• Y independent of Q. 

Then if f(x, y) is a function of two variables, and if we define 

g(x) :=E[f(x,Y)], 

then we have 

E[f(X,Y)\g]=g(X). 

In this lemma, take Q = J 7 ,, X = W s , Y = W s+ t — W s , and f(x, y) = h(x + y). Then define 

g(x) := E[h(W s+t -W s + x)] 

= E[h(x + Wt) (since Wt ~ N(0, t) has the same distribution as W s+ t — W s ) 
= E x [h(W t )\. 

Then 

E[h(W s+t )\F s ] = g{W s ) = E w '[h(W t )], 

which is the Markov property. 

In fact, Brownian motion has the strong Markov property (though we do not prove this). 

Strong Markov property Fix s;6l and define 

r := mm{t > 0\W t = x}. 

Then we have 

E[h{W T+ t)\T T ] = g(x) = E x h{W t ). 
9.5 Quadratic variation of BM 

Definition 9.8 (p th variation). Let V = {to,t±, . . . , t n } be a partition of [0, t], i.e. 

= to < *i < • • • < t n = t. 
The mesh of the partition is defined to be 

IIT^H = max \t k+ i - t k \ 

k=0,...,n— 1 
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The variation of a function / : R+ — > R on an interval [0,t], [f,f] {p) (t), is denned by 

n-1 

[f,f\i p) := Km El/ftH-iWftOr- (M 

In particular, if p = 1 this is called the fotaZ variation (or the _/irsi variation) and if p = 2 
this is called the quadratic variation. 

9.5.1 First variation 

Consider the first variation (or total variation), [/, f]^\ of a function /. Suppose / is 
differentiable. Then the Mean Value Theorem 4 implies that in each subinterval [tfc, tfc+i], 
there is a point t* k such that 



Then 



and so 



f(t k+1 )-f(t k ) = (t k+1 -t k )f'(t* k ). 

n—1 n—1 

E - = E i/'(*ifc)i(*fc+i - t k ), 

k=0 k=0 



ifjp = hm y2\f( t * k )\(t k+1 -t k ) 

" fe=0 



= /V( 

■/ 



,s)\ds. 





Thus, first variation measures the total amount of up and down motion of the path of / over 
the interval [0, t]. 

9.5.2 Quadratic variation of Brownian motion 

(2) 

To simplify notation, we write [/, f] t = [f]t for the quadratic variation of a function / over 
the interval [0, t]. 

Lemma 9.9. If f is differentiable, then [f]t = 0. 
Proof. 

n—1 n—1 

j2\f(t k+1 -f(t k )\ 2 = ei/W^+i-**) 2 

k=0 k=0 

n-1 



k=0 



4 The Mean Value Theorem states that if / is differentiable in (a,b), then there is a point x £ (a, 6) at 
which/(6)-/(a) = (6-o)/'(x). 
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and so 



rt— 1 
k=0 



lim IIPII / |/'(s)| 2 ds 



ll^ll 
= 0. 

□ 

Theorem 9.10. For Brownian motion W = (Wt)t>o we have 

[W] t = t, t > 0, 

or more precisely 

F{uo G Q : [W^tH = t} = 1. 
In particular, the paths of Brownian motion are not differentiable. 

Some words of intuition. Since Wt ~ N(0,t) its moment generating function M(a) is 
given by 

M(a) = E[exp(aW t )] = exp Q« 2 ^ • 

Expanding the exponentials as Taylor series in powers of a yields E[Wt] = 0, EfW^ 2 ] = t, 
E[W t 3 ] = 0, E[Wf] = 3t 2 . Hence the variance of Wf is 

var[Wf ] = E[W t 4 ] - (E[W?]) 2 = 3t 2 - t 2 = It 2 . 

The important observation is that, for small t, the variance of W 2 will be negligible compared 
to its expected value. Put another way, the randomness in W 2 is negligible compared to its 
mean, for small t. This suggests that if we take a fine enough partition V of [0, t], a finite set 
of points = to < t\ < . . . < t n = t with grid mesh ||"P|| = max \tk+i — tk\ small enough, then 
writing := Wt k+1 — Wt k and At^ := t^+i — tk, we conjecture that Y^=o wu ^ "closely 
resemble" 

n— 1 n— 1 

£E[l£] = 5>t fc = t. 

fc=0 fc=0 

This can be made rigorous, as we show below, and the limit of ^2^=0 as the partition 
becomes finer is the quadratic variation of Brownian motion over the interval [0, t] . 

We shall first prove that the quadratic variation of Brownian motion over [0, t] is equal 
to t in mean square, and then we shall prove that the result holds almost surely (the almost 
sure convergence proof is not examinable) . 

Recall that a sequence (X n ) n£ ^ of random variables converges in mean square (or in 
L,2(ft, J 7 , P)) to a random variable X if E[|X n — X\ 2 ] — > as n — > oo, and converges to X 
almost surely if P{w G Q\X n (oj) = X(oj)} — > 1 as n — > oo. 
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Proof of Theorem 9.10 I: convergence in Li- Let V = {to, ti, . . . , t n } be a partition of [0, t\. 
Set := Wt k+1 — Wt k and define the sample quadratic variation 



Qv 

Then 



n-l 
k=0 



n-l 

Q v -t = Yj[Dl-(t k+1 -t k )}. 

k=0 

We want to show that lim||p||^ (Q'P ~~ t) = in mean square. Consider an individual 
summand D\ — (th+i — tk)- This has expectation zero, so 

n— 1 

E[Q P -t]=Ej2[D 2 k - (t fc+ i - t fc )] = 0. 

fc=0 

Therefore, if we compute K[(Qp — t) 2 ] = var(Qp — t) and find it to approach zero as ||"P || — > 0, 
then we have shown that the quadratic variation of Brownian motion is equal to t in mean 
square or, equivalently, that var(Qp) — > as ||"P|| — > 0, so that Q-p essentially becomes 
non-stochastic as \\V\\ —5-0. 

For j / k, the terms D 2 — {tj + \ — tj) and D 2 — (t^+i — t k ) are independent (due to the 
independent increments property of BM), so 

n-l 



vai(Q P - t) = ^ Yar i D k ~ (tk+i ~ tk)] 

k=0 
n-l 

fc=0 
n-l 

= y^[3fa+i - tk) 2 — 2(tfc+i — tk) 2 + (tfc+i - tk) 2 ] 

k=0 

n-l 

= 2 y~ifa+i - *fc) 2 



fc=0 

n-l 

< 2||P|| J](* fc+ i - tfc) 
fc=0 

= 2||P||t. 

Thus we have 

E(Q P -t) = 0, var(Q P - 1) < 2||7 ? ||t. 
As ||P|| ->• 0, var(Qp - i) ->• 0, or E[(Qp - t) 2 ] ^ as ||P|| ->• (i.e. as n ->• oo), so 

Q-p — >■ t, in L2. 

□ 
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Remark 9.11 (Mean square versus almost sure convergence). We have shown mean square 
convergence (or Li convergence) of [W]t to t. When such convergence takes place, there 
is a subsequence of times (so another partititon of [0,t]) along which the convergence is 
almost sure (that is, the convergence takes place for all paths except for a set of paths having 
probability zero). We shall not delve into the subtle differences among different modes of 
convergence of random variables, but the next proof shows how one can establish almost sure 
convergence of \W\t to t. 

Proof of Theorem 9.10 IT. a.s. convergence*. To show that the convergence is also almost 
sure, consider the dyadic partition t k = kt/2 m ,k = 0, 1, . . . ,2 m , i.e. we partition [0,t] into 
2 m intervals of width t/2 m , so that the mesh of the partition approaches zero as m — > oo. 
Then the sample quadratic variation over [0, t] may be written as 

2 m -l 2 m -l 

Qm(t):= £ {W (k+l)t/2m -W kt/2 mf =: E(AJ^) 2 , 

k=0 k=0 

where we have written AW k = W^ k+1 ) t/2 m-W kt / 2 m. We have AW k ~ N(0, t/2 m ), AW k , AWj 
are independent for k / j, and hence (AW k ) 2 , (AWj) 2 are independent for k / j. 
Recall that for X ~ N(0,v) we have E[X A ] = 3v 2 , so that 

var[X 2 ] = E[X 4 ] - {E[X 2 }) 2 = 3v 2 - v 2 = 2v 2 . 

Therefore, from E[AV7 fc 2 ] = t/2 m we get 

HQm(t)] = t, 

regardless of m. Further, by the independence of the squared increments we have 

E[(Q m (t) - 1) 2 } = var(Q m (t)) 

(2 m -l \ 
fc=0 / 

2 m -l 
fc=0 




2t 2 

= > 0, as m — > co. 

2 m 

Therefore, since the limit of Q m (t) as m — > oo is [W]t- we have established the mean square 
convergence 

[W] t = lim Q rn {t) -»• t, in L 2 . 

m-5>oo 



9 BROWNIAN MOTION 



53 



Now we show almost sure convergence using the Chebyshev inequality and the Borel-Cantelli 
lemmas (see, for instance Grimmett and Stirzaker [5], Section 7. 3). 5 By Chebyshev's inequal- 
ity we have, for a > 0, 

n\Qm(t) - t\ > a} < -^E[(Q m (t) - t) 2 } = 

So 2 2 

n\Qm(t) - t\ > 1/m} < m 2 E[(Q m (t) - tf] = 

Write A m = {\Q m (t) — t\ > 1/m}, and consider the sequence of events (A m )^ =1 . Then 
Ylm=i ^(A m ) < oo, so by the Borel-Cantelli lemmas, the event that infinitely many of the 
A m occur has probability given by 

P Aim sup A m ) = P [ n M A k A = 0. 

/ \ m =ik=m J 

In other words, \Q m (t) —t\< 1/m for large m, almost surely, or 

[W]t = lim Q m (t) — > t, almost surely. 

m^oo 

□ 



5 Chebyshev's inequality follows from the following result, which is Theorem 7.3.1 in [5]. 
Theorem 9.12. Let h : R — > [0, oo) be a non-negative function, Then 

P(W >a)<^Ml, a >0. 

Proof. Let A := {/i(-X") > a}. Then ft(X) > al^- Taking expectations gives the result. 



□ 



Setting h(x) — \x\ gives Markov's inequality. Taking h(x) = x gives Chebyshev's inequality: P(|X > a) < 
E[X 2 ]/a 2 . 

The Borel-Cantelli lemmas (Theorem 7.3.10 in [5]) state: 

Theorem 9.13 (Borel-Cantelli lemmas). Let Ai,A'2,... be an infinite sequence of events from some probability 
space (f2, F, P). Let A be the event that infinitely many of the A n occur (or {A n infinitely often} = {A n i.o.}, 
given by 



oc oc 



A := {A n i.o.} = limsup,4 n = Q Q A 

n=l k—n 

Then: 

1. P(A)=0 «/E^l P (^n)<(», 

2. P(A) = 1 if S^Li P(Aj) = oo and Ai,A 2 ... are independent events. 
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9.6 Path length* 

Given a continuous function / : [0, t] — > M. its total variation over [0, t] is, over any partition 
V = {0 = t < h < . . . < t n = t} of [0, t], 

n-l 

TV(f) ^ [/, /]« = |( VmJ2 \f(t i+1 ) - f( ti )\. 

This may be infinite, or some finite number, in which case we say that / has bounded variation. 

Consider an element of arc length As, along / in the interval [£j,tj + i]. If this interval 
is small, we have (Asj) 2 « (Atj) 2 + (A/j) 2 , where we have written Aij = ij+i — ij and 
A/j = f(U + i) — f(U). By the triangle inequality we have 

|A/i| < \A Si \ < \Afi\ + \AU\. 

Denoting the total arc length (or path length) of / over [0, t] by s[f] we therefore have, in 
the limit \\V\\ ->• 0, 

TV(f)<s(f)<TV(f) + t. 

Therefore, 

finite path length <^=^ TV(f) < oo. 

In contrast, the quadratic variation of / over [0, t] is 

n— 1 

[/]« = 

< 



lim ( max |AfJ ) lim > |Afj 

IIT'IKO Vi=0,...,n-1 7||p||^o^' 



= lim max |A/d TV(f). 

For any continuous function, limy-pn^o (maxi=o,...,n-i |^/«|) — y 6 , so we conclude that 

TV(f) < oo [f] t = for all t > 0. 

In other words, paths of Brownian motion (W s )o< s <* over the interval [0,t] have infinite path 
length. 

Because the total variation of Brownian motion is infinite (i.e. Brownian paths are "very 
long" ) one is not able to give meaning to integrals with respect to Brownian motion, J * b s dW s , 
via a path-by-path procedure. Thus we are led to a new type of integral, the ltd stochastic 
integral, which we shall describe shortly. 



6 This is a standard theorem from real analysis, proven from compactness arguments. 
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Remark 9.14 (Heuristics). If we (formally) write dWt for the infinitesimal (corresponding to 
the infinitesimal time interval dt) increase in Wt, then we have "J * dW s dW s = t", which is 
often summarised by the formula 

dW t dW t = dt. 

A better way to write this would be 

d[W] t = dt. 

Formally, note that if dWt dWt = dt, then in some sense dWt/ dt = 1/y/di — > oo as dt — > 0. 
In other words, Brownian motion is nowhere differentiable, as we saw earlier. 
For the partition V defined by 

= t < h < . . . < t n = t, 

we defined 

D k :=W tk+1 -W tk , At k := t k+1 - t k , k = 0, 1, . . . , n - 1. 

We have that 

E[D 2 k ] = At fc) var(Z^) = 2(At k ) 2 . 

It is tempting to argue that, because the variance of D\ is much smaller than its mean, then 
we have that for small At k , D? ~ At k . But this equation has no content: when At k is small, 
it would be true because both sides are near zero. A better way to capture what we think is 
going might be to write 



At k 



But this is never true either. The left hand side is the square of the standard normal random 
variable 

n:=-S=~JV(0,l), 

whose distribution is the same no matter how small we make At k . 

To better understand what is going on, for some large positive integer n, define t k := 
kt/n, k = 0, 1, . . . ,n, so that At k = t/n for all k = 0, 1, . . . , n — 1. Then 

Y 2 

D 2 = t^, fc = 0,l,...,n-l. 

n 

The random variables Yo, Y±, . . . , Y n -i are i.i.d., so the Law of Large Numbers implies that 
n Ylk=o Y k converges to the common mean E[Y" fc 2 ] = 1 as n — > oo, and hence J2k=o ^\ 
converges to t. Each of the terms D\ in this sum can be quite different from its mean 
At k = t/n, but when we sum many terms like this, the differences average out to zero. 

So the point is that although we write dWt dWt = dt frequently, this has no rigorous 
mathematical meaning unless we consider the integrated relation [W]t = t. 
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Remark 9.15. For the partition V denned by 

= t < ti < . . . < t n = t, 
we have computed the quadratic variation 

n-l 

[W]t := lim V Dl = t. (9.5) 

In addition to this, we can compute the cross variation of Wt with t or the quadratic variation 
of t, given by 

n— 1 n— 1 

lim VZ) fc At fc = 0, lim V(A^) 2 = 0. (9.6) 

We know that the second of these limits is zero since t is a differentiable function (see Lemma 
9.9, or the argument below). To see that the first limit in (9.6) is zero, observe that 



\D k At k \ < max \Dj\ \ At k , 

\0<j<n—l 



and hence 

n-l 



DkAt k 



k=0 



< ( max \Dj\ I t, 

0<j<n-l J 



which converges to zero as H^H — > since W is continuous. 
For the second equality in (9.6) we observe that 

n— 1 n— 1 

Y / (^ k ) 2 <\\r\\Y,^k = \\r\\t, 

k=0 k=0 

which clearly converges to zero as \\V\\ — > 0. 

Just as we informally write dVFtdW^ = dt for (9.5), we capture (9.6) by writing 

dW t dt = 0, dtdt = 0. 

9.7 Other variations of Brownian motion 

Consider the first variation (or total variation) of BM, denoted by 

rt— 1 

TV(W)t := lim V \D k \. 
Lemma 9.16. The first variation of BM is infinite, TV(W)t = oo, for t > 0. 
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Proof. We have 



t=[W] t = lim Vl>2<( lim max \Dj\] TV(W) t - 
So TV(W)t < oo is equvalent to [W]t = 0, which is false, so the result follows. 



For the third variation, defined by 

n-1 

mWjf : = lim V \D k \ 3 , 



□ 



fc=0 



we have: 
Lemma 9.17. 



[W, W]f ] =0, t > 0. 
Proof. 

n-1 , \ 

[VF Wlj 3) := lim V \D k \ z < \W] t ( lim max \DA ) = 0. 



□ 



9.8 Levy's characterisation of Brownian motion* 



BM W is a martingale with continuous paths whose quadratic variation is \W\t = t. In fact, 
this is a complete characterisation of BM, given in the following Theorem (see Shreve [13], 
Section 4.6.3 for more details). 

Theorem 9.18 (Levy's theorem, 1-dimensional). Let M be a martingale relative to a filtra- 
tion, with Mq = 0, continuous paths, and [M] t = t for all t > 0. Then M is a BM. 

10 The Ito integral 

We consider how to define an integral with respect to Brownian motion. The probability 
space (fi, IP) (with F = {F)t>o the filtration generated by Brownian motion) is given, and 
always lurks in the background, even when not explicitly mentioned. Recall that Brownian 
motion, Wt(w) : [0, oo) x $7 — > R has the properties: 

1. Wo = 0; (technically, P{w : W (u) = 0} = 1); 

2. Wt is a continuous function of t; 

3. If = to < ti < . . . < t n = t, then the increments 

w tl -w t0 ,...,w tn -w tn _ 1 

are independent, normal, with 

nw tk+1 -w tk ] = o, 

E[(W tfc+1 - W tk f\ = t k+ i-t k , k = 0,1,..., n-1. 
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10.1 Construction of the Ito integral 

We want to construct the ltd integral, which we write as 

rt 



-J 

Jo 



b x dW„ t>0. 



The integrator is Brownian motion, (Wt)t>o, with associated filtration (Ft)t>o and the fol- 
lowing properties: 

1. s < t every set in F s is also in Ft, 

2. Wt is .^-measurable, Vi > 0; 

3. for t < ti < . . . < t n , the increments W tl - W t ,W t2 - W tl , ■ ■ ■ ,W tn - W tn _ x are 
independent of Ft- 

The integrand is a process b = (h)t>o, where 

1. bt is J^-measurable Vt > (i.e. (bt)t>o is adapted to the filtration (Ft)t>o)', 

2. b is square-integrable: 



E 



r rt 

2 



/ 

Jo 



bt ds 



< oo, V t>0. 



Remark 10.1. For a differentiable function f(t), we can define 

rt rt 



[ b(s)df(s)= [ b(s)f'(s)ds. 
Jo Jo 



This won't work when the integrator is Brownian motion, because the paths of Brownian 
motion are not differentiable. 

10.2 Ito Integral of an elementary integrand 

Let V = {to, ti, . . . , t n } be a partition of [0, t], i.e. 

= t < h < . . . < tn = t. 

Assume that b is constant on each interval [tfc, ife+i), such that for t* £ [tk,tk+i), h* = b tk . 
We call such a process b an elementary process, or a simple process. The Ito integral R of 
such a process is defined by 

n-l 



I t :=J2bt k (W tk+1 -W tk ), t>0. 



k=0 



Remark 10.2 (Interpretation as gains from trading). We can interpret the functions b and 
W as follows: 
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• Think of Wt as the price per share of an asset at time t. 

• Think of to, h, ■ ■ ■ , t n as the trading dates for the asset. 

• Think of b tk as the number of shares of the asset held in the interval [tk,tk+i), i.e. 
acquired at trading date tk and held until trading date tk+i (so the process 7r, defined 
such that irt k+1 = b tk , is a predictable process). 

Then the Ito integral I t can be interpreted as the gain from trading at time t. 

Definition 10.3 (Ito integral of elementary process). If i& < t < tfc+i, then the Ito integral 
It = Jq b s dW s of the elementary process b is defined by 

I t := f bs dW s :=J2b tj (Wt, +1 ~W tj ) + b tk (W t -W tk ), t>0. 

10.3 Properties of the Ito integral of an elementary process 
Adaptedness For each t > 0, It is JVmeasurable; 

Linearity With 

I t = ( b s dW s , J t = [ a s dW s , 
Jo Jo 

then for a,j3el, 



al t + fiJ t = f {ab s + pa s ) dW s 
Jo 



Martingale property (It)t>o is a martingale. Let us prove this for the case of an integrand 
which is an elementary process. 

Theorem 10.4 (Martingale property). The process I = (It)t>o defined by 

k-l 

It :=E^(% " W h) + KiWt ~ W tk ), 
j=0 

is a (P,F) -martingale. 

Proof Let < s < t be given. We treat the more difficult case that s and t are in different 
subintervals, i.e. there are partition points tp and tk such that s G [te,te + i] and t G [tfc, tfc+i]- 
Write 

k-l 

It = Y.K( W t j+ ,-W tj ) + b tk {W t -W tk ) 

3=0 

e-i 

3=0 

k-l 

+ b te (W te+1 - W te ) + b tA W t j+ i - w tj ) + h k (W t - W tk ). (10.1) 

j=e+i 
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Compute conditional expectations. For < s < t, we have 

-l 



E 



3=0 

^[bt t (W tt+1 -W tt )\T t 



£-1 
3=0 

b te (E[W te+1 \T s ] - W te ) = b te (W s - W te ). 



These are the conditional expectations of the first first two terms on the RHS of (10.1). They 
add up to I s and so contribute this to Ef/^J^]. We show that the third and fourth terms 
contribute zero: 



E 



fe-i 
j=i+i 



J 7 , 



k-l 

E[E[b tj (W tj+l -W tj )\T tj ]\T s ] 

j=e+i 

k-l 

E[b tj (E[W tj+1 \T tj ]-W tj )\T s ]=0, 

j=e+i 



and 



E [b tk (W t - W tk )\F s ] = E [b tk {E[W t \F tk ] - W tk )\F s ] = 0. 



□ 



The Ito isometry Because {It)t>o is a martingale and Io = we have E[I t ] = for all 
t > 0. It follows that var(/ t ) = E[lf], a quantity given by the formula in the next theorem. 

Theorem 10.5 (Ito isometry). The ltd integral of the elementary process b, defined by 

k-i 

It :=E^(% " W tj ) + h k (W t - W tk ), (10.2) 



3=0 



satisfies 



E[/ t 2 ] = E 



/' 

J o 



bl ds 



t > 0. 



Proof. To simplify notation, write Dj = Wt i+1 — Wu,j = 0, . . . , k — 1 and = Wt — Wt k , 



so that (10.2) is written as It = Ylj=o^tjDj- Then 

k 

I t=Y, b l D J +2 E Kh.DiDj. 

j=0 0<i<j<k 

First we show that the expected value of the cross terms is zero. For i < j, the random 
variable b^btjDi is J 7 ^. -measurable, while the Brownian increment Dj is independent of T^, 
so E\Dj\T t ^\ = E[Dj] = 0. Therefore, 



Elh^DiDj] = E [E[b ti b tj DiDjlFt.]] 
= E [b ti bt j D i E[Dj\J : ' tj ]] 
= 0. 
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Now consider the square terms b 2 .D 2 . The random variable b\. is Ft j -measurable, while the 



squared Brownian increment Dj is independent of J-t j , so EfDjjJ 7 ^] = E[XH] = tj+\ — tj, for 
j = 0, . . . , k - 1, and E[D 2 \F tk ] = E[D 2 ] = t-t k . Therefore, 



E[/ 2 ] = 5>[&^ 2 ] 

3=0 
k 

= £e[e[6 2 .£ 2 |^.] 

3=0 
k 

= ^E[6f.E[^ 2 |^.]] 

3=0 
k 

= ^E[6 2 E[D 2 ]] 

j=o 
fe-i 

= ^E[6f.(t j+1 - +nb 2 t k {t ~ **)]■ 

3=0 

But b tj is constant on so b 2 .(tj + \ — tj) = f* 3+1 b 2 ds and similarly, b 2 k (t — t^) 



k-1 

E 

3=0 

E 











^E 




j +E 





E 



fe-i 

E 

6 2 ds 
IV 



3 + 1 



b 2 ds I + / bj ds 



□ 



Quadratic variation of the integral The quadratic variation of the integral, thought 
of as the quadratic variation process (I]t)t>o of the integral process I = (It)t>o- Brownian 
motion Wt = Jq 1 • dW s has quadratic variation [W]t = J * l 2 • d[W] s = Jq 1 ■ dt. We say that 
Brownian motion accumulates quadratic variation at the rate of one per unit time. In the Ito 
integral It = J * b s dW s , BM is scaled in a time and path-dependent way (i.e. depending on 
(s,lo) G [0,t] x f2) by the integrand b s . Because increments are squared in the computation 
of quadratic variation, the QV of BM will be scaled by b 2 as it enters the integral. The 
following theorem gives the precise statement. 

Theorem 10.6 (Quadratic variation of the Ito integral). Let b be a simple process. Then 
the Ito integral 

I t = f b s dW s , t>0, 
Jo 
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has quadratic variation process ([I]t)t>o given by 

[I] t = I b 2 s ds, t>0. 
Jo 

We say that the Ito integral accumulates quadratic variation at a rate b 2 ,s G [0,t] per 
unit time, and that the quadratic variation accumulated up to time t by the integral is 

Proof. First compute the quadratic variation accumulated by the integral on one of the 
subintervals [tj,tj + \) on which b s = bt d ,s 6 [tj,tj+i), is constant. Choose partition points 

tj = s < si < . . . < s m = tj+i, 

and consider 

m— 1 m— 1 

^(/ Si+1 -/ Si ) 2 = Y,lMw St+1 -w s j} 2 

i=0 i=0 

m—1 

= b 2 t .Y,{W Si+l -W Si )\ (10.3) 

i=0 

As m — > oo and the mesh of the partition, maxj = o,..., m -i(sj + i — si) approaches zero, the 
term YliL'o 1 (Ws i+1 — W Si ) 2 converges to the QV accumulated by BM over [tj,tj+i), which 
is tj + i — tj. Therefore, the limit of the RHS of (10.3), which is the QV accumulated by the 
integral over [tj,tj + i), is 




where we have use the fact that b s is constant for s G [tj, Similarly, the QV accumulated 

by the integral over [t&,t] is f* b 2 s ds. Adding up all these contributions proves the theorem. 

□ 

Informally, we establish the theorem in differential form via 

dl t = bt dW t => d[I] t = dl t dl t = b 2 dW t dW t = b 2 d[W] t = b 2 dt, 

just as we wrote d[M / ]< = dWt dWt = dt earlier. In fact, one can do a lot of the calculations 
in Ito calculus simply by applying the informal multiplication rules: 

dW t dW t = dt, dW t dt = dtdt = 0. 

Remark 10.7. Note the contrast between Theorems 10.5 and 10.6. The QV [I]t is computed 
path-by-path, so the result can depend on the path, and so in principle is random. The 
variance of the integral is precisely the expectation of the QV, as given by the Ito isometry 
(i.e. it is is an average over all possible paths of the QV), and so is non-random. 
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10.4 Ito Integral of a general integrand* 

Fix t > 0. Let b be a process (not necessarily an elementary process) such that 
• b s is ./-^-measurable, Vs £ [0,i]; 



• E 



< oo. 



We then have the following result. 

Theorem 10.8. There is a sequence of elementary processes 

ft 



lim E 

n->oo 



_■/ 



bJ 2 ds 



0. 



Proof. See [13], Section 4.3, [11], Section 3.1, or [9], Section 3.2 and Problem 3.2.5 in [9]. 



□ 



We have shown how to define 



<"> = fb^dWs, 
Jo 



for every n £ N. We now define the general Ito integral by 



t ft 
b s dW s := lim / bf ] dW s . 

rwoc J Q 



The only difficulty with this approach is that we need to make sure the above limit exists. 
Suppose m and n are large positive integers. Then 



E[|4 n) -4 m) | 2 ] = var(/r ; -/ro 



(n) r (m)x 



= E 

(Ito isometry) = E 
(triangle inequality) < E 



f (b^-b^fd 
Jo 

fdb^-bsl + lbs-b^lfds 
Jo 



{(a + b) 2 <2(a 2 + b 2 )) < 2E \f\b^ -b s \ 2 ds\ + 2E [ f \b s - b^\ 2 ds] , 

Jo J Uo J 

which approaches zero as m,n — > oo, by Theorem 10.8. This guarantees that the sequence 
{1^)^=1 is a Cauchy sequence in L2(fi,.F,P) and so has a limit. 
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10.5 Properties of the general Ito integral 

The general Ito integral is 

I t = I b s dW s , 
JO 

where b is any adapted, square-integrable process. Its properties are inherited from the 
properties of Ito integrals of simple processes and are summarised below. 

Adaptedness For each t > 0, It is -Ft-measurable; 

Linearity If 

I t = I b s dW s , J t = ( a s dW s , 
Jo Jo 

then for a, (3 £ R, 

aI t + pJ t = / (ab s + (3a s ) dW s . 
Jo 

Martingale property (It)t>o is a martingale. 

In fact, we have the converse result, known as the martingale representation theorem 
(which we do not prove, see [11] for example). 



Ito and martingale representation theorems for Brownian motion* 



Theorem 10.9 (Ito representation theorem for Brownian motion). Let (Wt)t>o be a Brow- 
nian motion on a filtered probability space (SI, J 1 ", F := (F)t>o, P)> with (Ft)t>o the natural 
filtration F = a(W s ,0 < s < t). Suppose that X G £2^, Ft, P) (i-S. X is T \ -measurable 
and K[X 2 ] < co). Then there exists an adapted process b such that IE J Q * b 2 s ds < 00, t > 
and 

X = E[X}+ [ b s dW s . 
Jo 

Theorem 10.10 (Martingale representation theorem for Brownian motion). Let (Wt)t>o 
be a Brownian motion on a filtered probability space F := (J r t)t>o, F), with (-7 r t)t>o 

the natural filtration Ft = a(W s ,0 < s < t). Suppose that the process M = (Mt)t>o is a 
square-integrable martingale with respect to this filtration, written M G M2 (that is, Mt £ 
L-ziQ,, ^i,P) for allt > 0, or¥,[M 2 ] < 00, for allt > 0). Then there exists an adapted process 

b such that E 



lo b's ^ 



< 00, t > and 



M t = M + I b s dW s . 
h 



The Ito isometry The variance of the Ito integral is var(/ t ) = E[/ t 2 ] given by 



E[I?\ = E 



f 

Jo 



bj ds 
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Continuity It is a continuous function of the upper limit of integration t. 

Quadratic variation The Ito integral 

I t = f b s dW s ,t>0, 
Jo 

has quadratic variation process ([/]t)t>o given by 

[I] t = fb 2 s ds. 
Jo 

Example 10.11. Consider the Ito integral 

I 



t 

It= I W,dW, 



We approximate the integrand by an elementary process bi n \s G [0, t], in the following way. 
Partition the interval [0, t] into n time intervals St, so that t = nSt, and 

t kt 
= t < h = St =-<...< t k = kSt = — <...< t n = t, 
n n 

and define by 

h {n) = Wtk = W kt/n , if^<*<^, fc = 0,...,n-l. 
Then by definition 

nt n-l 

I t = / W s dW s =lim o J2 W kt/n{W {k+ i )t /n-W kt/n ). 
Jo n °° fe=0 

To simplify notation, write W k = W kt / n so that 

n-l 



f W s dW s = lim V W k (W k+1 - W k ) 

J U ; r, 



k=0 

Then we note that 



Wfc+i-Wj£ = (W fc+ i - W fc )^ + 2W k W k+1 - 2W k 2 
= (W k+1 -W k ) 2 + 2W k (W k+1 -W k ), 



so that 



n— 1 ^ 
£Wk(W fc+ i-W fc ) = - 



fc=0 



n— 1 rt— 1 

.fc=0 fc=0 
n-l 



= Jf^n-E^+l-^) 2 )- (Wt> = 0) 



fc=0 
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Now we let n — > oo and use the definition of quadratic variation to get 

J*W a dW a = \{Wf - [W] t ) = \{Wf-t). 
Remark 10.12 (Reason for the \t term). If / is a differentiable function with /(0) = 0, then 

jf m dm = jT /(,)/'(,) d, = \f( S % = \f(t). 

In contrast, for Brownian motion, we have 

J*W s dW s = ^(W t 2 -t). 

The extra term comes from the nonzero quadratic variation of Brownian motion. It has 
to be there, because E[/q W s dW s ] = (the ltd integral is a martingale), but E[^W t 2 ] = \t. 
Note that this remark is equivalent to our initial characterisation of Brownian motion in 
Remark 9.6. 



11 The Ito formula 

11.1 Ito's formula for one Brownian motion 

We want a rule to "differentiate" expressions of the form f(Wt). If Wt were differentiable 
then the ordinary chain rule would give 

f t f(w t ) = f'(W t )Wl, 

which could be written in differential notation as 

df(W t ) = f'(W t )W;dt = f(W t ) dW t . 

However, Wt is not differentiable, and in particular has nonzero quadratic variation, so the 
correct formula has an extra term, namely, 

df(W t ) = f{W t )W[dt = f(W t ) dW t + \f\Wt) d[W] t , 

with the understanding that d[W]t = dt. This is a version of Ito's formula in differential 
form. Integrating this, we obtain a version of tto 's formula in integral form. 

Theorem 11.1 (ltd formula for one BM). If f{x) is a C 2 (M) function and t>0, then 

f(W t ) - f(W ) = f f'(W s ) dW s + \ f f"(W s ) d[W] s . (11.1) 
Jo 1 Jo 
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Remark 11.2 (Differential versus integral forms). The mathematically meaningful form of 
Ito's formula is its integral form, because we have solid definitions for the integrals appearing 
on the RHS of (11.1). For pencil and paper computations, the more convenient form is the 
differential form. 

Proof of Theorem 11.1. Fix t > and let V = {to,h, . . . ,t n } be a partition of [0, t]. By- 
Taylor's theorem we have 

n-1 



f{w t )-f{w ) = £[/(wi fc+1 )-/(wi fc )] 

k=0 

n— 1 r » 

= E f'( W t k )W k+ i -w t „) + ^f"(W tk )(W tk+1 -w tk f + --- 

k=0 L 

llV M° f f'(W s ) dW s + \f f"(W s ) d[W] s , 
Jo z Jo 

with higher order terms disappearing (since the third variatiion of BM is zero by Lemma 9.17 
an d Sfc=o D% < Sfc=o \Dk\ 3 i where Dk := Wt k+1 — Wt k ) and the last summation converging 
to a Riemann integral as it becomes the quadratic variation of an Ito integral, i.e. for the 
Ito integral 

I t = fb s dW s = Urn Y,b tk (W tk+1 -W tk ), 

JO 11^11^0, r> 



k=0 



we have 



/ b 2 s ds = [I] t = lim V(/ tfc+1 -J< 

Jo wn^t^ 



n-1 

.2 

k=0 
n-1 

ii i p E»K"''.«- ff '.)'' 



A heuristic derivation would simply state that, by Taylor's theorem 



□ 



#(W t ) = /'(Wt) dW t + ^/"(W t ) dt, 

where we have used dWt dWt = dt in the last term on the RHS, and higher order terms are 
neglected. 

Corollary 11.3 (Ito formula for function of time and one Brownian motion). If St = f(t, Wt) 
for some C 1,2 (IR + x R) function f(t,x), then 

dS t = df(t, W t ) = f t (t, W t ) dt + f x (t, W t ) dW t + \f xx it, W t d[W] t , 

and higher order terms do not contribute, since we have shown earlier (Remark 9.15) that 
we have the informal rules dWtdt = and dtdt = 0. 
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Definition 11.4 (Geometric Brownian motion). Geometric Brownian motion is the process 
S = (S t ) t >o given by 



St = So exp 



aW, + . // ^a 1 ) 



where (i and o > are constant, and the parameter a is called the volatility of the process 
S. 



Define 



so that S t = f(t,W t ) and 



./'(/. r) - .S'oexp s ax + (fi - 7,°~ 2 )t } ■ 



ft(t,x) = (^i- ^a 2 ^j f(t,x), f x (t,x) = af(t,x), f xx (t, x) = a 2 f(t, x), 

with the subscripts denoting partial derivatives. Then by Ito's formula 

dS t = df(t,W t ) 

= f t (t, W t ) dt + f x (t, W t ) dW t + l -f xx (t, W t ) dt 

V - f(t, W t ) dt + af(t, W t ) dW t + ^a 2 f(t, W t ) dt 

// - S t dt + aS t dW t + ^a 2 S t dt 

= fiStdt + aStdWt, 

which is geometric Brownian motion in differential form, Geometric Brownian motion in 
integral form may be written as 

t ft 



S t = S + [ fiS s ds+ f aS s dW s . 
Jo JO 



Quadratic variation of geometric Brownian motion In the integral form of geometric 
Brownian motion, 

S t = S + / /j.S s ds+ / aS s dW s , 
Jo Jo 

the Riemann integral 

F(t)= [ fiS s ds 
Jo 



is differentiable with F'(t) = (iSt- This term has zero quadratic variation. The ltd integral 

io 



G(t)= fa 
Jo 



S s dW s 
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is not differentiable. It has quadratic variation 

[G] t = f a 2 S 2 ds. 
Jo 

Thus the quadratic variation of S is given by the quadratic variation of G, i.e. 

[S]t = [G] t = fa 2 S 2 ds. 
Jo 

In differential notation we write 

d[S] t = dS t dS t = a 2 S 2 dt, 

which follow from the following informal multiplication rules involving the differentials dt 
and dWt- 

d[W] t = dW t dW t = dt, dW t . dt = dt. dW t = dt. dt = 0. 



Remark 11.5. Note that 

fi 2 

JO JO 

indicating that for geometric Brownian motion, the quadratic variation, when scaled by the 
square of the stock price process, is a measure of the volatility of the process S. 

11.2 Ito's formula for Ito processes 

Definition 11.6 (Ito process). Let (Wt,^ r *)t>o be a standard Brownian motion. An Ito 
process is a stochastic process of the form 



X t = X + f a s ds+ f b s dW s , t>0, 
Jo Jo 



(11.2) 



where Xq is non-random and a, b are adapted stochastic processes satisfying J Q * |a s | ds < oo 



and E 



Iob 2 ds 



< oo. 

In differential form we write (11.2) as 

dX t = a t dt + b t dW t . 

Lemma 11.7 (Quadratic variation of an ltd process). The quadratic variation of the R6 
process (11-2) is the process (\X\t)t>o given by 

[X] t = ['bids, t>0. 
Jo 

Proof. This is immediate from the fact that the quadratic variation of J * a s ds is zero. 

□ 
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Definition 11.8 (Integral with respect to an ltd process). Let (Xt)t>o be the ltd process 
(11.2) and let (6t)t>o be an adapted process satisfying 



E 



f e 2 s b 2 s ds < oo, f \9 s a s \d 
Jo J Jo 



t 

's < oo, 



for every t > 0. The ltd integral of 9 with respect to X is the process J = (Jt)t>o defined by 

J t := f 6 S dX s := f 9 s b s dW s + / 6 s a s da. 
JO Jo Jo 

Theorem 11.9 (Ito formula for ltd processes). Lei pQ) t >o &e the ltd process (11.2) and let 
f(t,x) e C ll2 ([0,oo) x R). T/ien, /or every t > 0, 

/(t,X t ) = /(0,X )+ f f t (s,X s )ds+ f f x (s,X s )dX s + \ f f xx (s,X s )d[X] s 

Jo Jo * Jo 

= /(0,X o ) + jf (/t(a,X i ) + a 8 / x (s,X s ) + ^/ xx (a,X a )^ da 

+ f b s f x (s,X s )dW s . 
Jo 

Proof. As for the ltd formula with respect to BM, and use the fact that [X] t = [I] t = J* b 2 s ds. 



□ 



It is usually easier to remember and use this theorem in the differential form 

df(t, X t ) = f t (t, X t ) dt + f x (t, X t ) dX t + l -f xx (t, X t ) d[X] t , 

where d[X] t = dX t dX t is computed according to the rules 

dt dt = dt dW t = dW t dt = 0, dW t dW t = dt. 
Example 11.10 (Generalised geometric Brownian motion). Define the ltd process 

X t = J a 8 dW s + J (ps- ±o£j ds, t > 0, 
where fx, a are adapted processes. Then 

2 

d[X] t = a 2 d[W] t = a 2 dt. 
A common model for an asset price process S = (St)t>o is given by 

S t = S e Xt , 



dX t = a t dW t +[ lit- 2 <yf J dt, 
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with So > non-random, which is called a generalised geometric Brownian motion. We write 
St = f(X t ) where f(x) = Soe x . The ltd formula gives 

dS t = fi t S t dt + <j t S t dW t . 

Applying the ltd formula to the function g(t, St) = log St, we find that 

1 2 



d(logS t ) = dX t = a t dWt + \& ~ - 2 *f) dt. 

11.3 Stochastic differential equations 

Given an ltd process X := (X t )t>o satisfying 

X t = x+ /i s ds+ / a s dW s , 
Jo Jo 

which we usually write in differential form 

dX t = fitdt + a t dWt, (11.3) 

then given a function / G C ll2 ([0,oo) x M) (i.e. / = f(t,x) for t £ [0,oo),x € K, / : 
[0, oo) xl->t, differentiable at least once with respect to t and at least twice with respect 
to x), the process (lt)t>o defined by Yt := f(t,Xt) has differential given by 

dY t = df(t, X t ) = f t (t, X t ) dt + f x (t, X t ) dX t + l -f xx (t, X t ) d[X] t , 

where d[X] t = dX t dX t is computed according to the rules 

dt dt = dt dW t = dW t dt = 0, dW t dW t = dt. 
In integral form Yt is given by 

Y t = Y + J" (f t (s, X 8 ) + n a f x {s, X s ) + l 2 a 2 J xx {s, Xsfj ds + J* a s f x (s, X 8 ) dW s . 
11.3.1 Markovian diffusions 

If, in (11.3), we have fit = ^(t,X t ),at = a(t,X t ) for well-behaved (see precise conditions 
later) functions fi(t,x),a(t,x), so that 

dX t = n(t, X t ) dt + a(t, X t ) dW t , 

which is called a stochastic differential equation (SDE) for X, then the process X is Marko- 
vian: 

E[h{X T )\T t ] = E[h(X T )\X t ], < t < T, 
and the integral equation for Y may be written 

Y t = Y + f (ft(s,X s )+Af(s,X s )) ds+ f a(s,X s )f x (s,X s )dW s , 
Jo Jo 

where A is called the generator of the diffusion X, and is defined by 

Af(t, x) := fi(t, x)f x (t, x) + l,o- 2 (t, x)f xx (t, x). 
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11.3.2 Solutions to stochastic differential equations 



We ask whether there exists a well-defined process X satisfying the stochastic differential 
equation (SDE) 

dX t = n{t,X t )dt + a(t,X t )dW t , (11.4) 
or, more precisely, whether there exists a process X satisfying Xq = x and 

X t = x+ [ [i(s,X s )ds+ f a(s,X s )dW s , t>0. 
Jo Jo 

The basic existence result is as follows. Suppose there is a constant K such that for all x, y, t 
we have 

\n(t,x) - n(t,y)\ <K\x-y\, \a(t, x) - a(t,y)\ < K\x - y\ \n(t, x)\ + \a(t, x)\ <K(l + \x\). 

(The first two conditions are Lipschitz continuity in x.) Then the SDE (11.4) has a unique, 
adapted, continuous Markovian solution, and there exists a constant C such that 

MM 2 ] < Ce ct (l + \x\ 2 ). 

Example 11.11 (Exponential martingales). Let 9 be a process adapted to the filtration of the 
Brownian motion W. Define the process Z = (Zt)o<t<T by 

z t = exp (- jf* e s dw s -±J* e 2 s d[w])j . 

In Problem Sheet 2 we show via the ltd formula that 

dZ t = -9 t Z t dWt, 

so Z is the ltd process given by 



Jo 



e s z s dw s , t>o, 



and Z is a martingale provided that E J Q T OfZ 2 dt 



< oo. 



Remark 11.12 (Novikov condition). A sufficient condition for Z to be a martingale is the 
Novikov condition 



E 



exp 



9 2 t dt 



< oo. 



11.4 Multidimensional Brownian motion 

Definition 11.13 (d-dimensional Brownian Motion). A ti-dimensional Brownian motion is 
a process 

W t = (W t \...,W t d ) 



with the following properties: 
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• Each W\ (i = 1, . . . , d) is a one-dimensional Brownian motion; 

• If i / j, then the processes W\ and W 3 t are independent. 

Associated with a d-dimensional Brownian motion, we have a filtration {Tt)t>§ such that: 

• For each t, the random vector Wt is immeasurable; 

• For each t < t\ < . . . < t n , the vector increments 

w tx -w u ...,w tn -w tn _, 

are independent of Ft- 
11.4.1 Cross-variations of Brownian motions 

Because each component W\ of Wt is a one- dimensional Brownian motion, we have 

[W l ] t = t, i = l,...,d. 
However, if we define the cross-variation between W 1 and as 

[W\ W% := lim£(n +1 - W{ k ){Wi k+i - W*), i,j = l,...,d, 

where V = {to,t\, . . . , t n } is a partition of [0, t], then we have: 
Theorem 11.14. Ifi^j, then 

[W\W j ] t = 0. 

Proof. Let V = {to, t\, . . . , t n } be a partition of [0,t\. For i / j, define the sample cross 
variation of W\ and W/ on [0, t] to be 

n— 1 

fc=o 

The increments appearing on the RHS of the above equation are all independent of one 
another and all have mean zero. Therefore 

E[Gp] = 0. 

We compute var(Cp) = E[C|,]. First note that 

c 2 v = E(n +1 -n) 2 « +1 -o 2 

fc=0 
n-1 
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All the increments appearing in the sum of cross terms are independent of one another and 
have mean zero. Therefore 

n-l 

var(Cp) = E[C 2 P ] = £ (W? fc+1 - Wt k ) 2 {Wl k+i - Wtf. 

k=0 

But {W'l k+1 — Wl k ) 2 and (W^ — Wj^ k ) 2 are independent of one another, and each has ex- 
pectation (tfc+i — ifc). It follows that 

n— 1 rt— 1 

var(Cp) = J> fe+ i - t fe ) 2 < ||7>|| £(tfc + i - tfc) = ||7>||t. 
fe=0 fc=o 

As IIPH — )■ we have var(Cp) — > so Cp converges in mean square 7 to the constant E[Cp] = 
0. 

□ 

11.4.2 Levy's characterisation of Brownian motion* 

Levy's characterisation of BM (as given in Theorem 9.18 extends to the multi-dimensional 
case (see Shreve [13], Section 4.6.3 for more details). 

Theorem 11.15 (Levy's theorem, d-dimensional) . Let M be a d- dimensional martingale 
relative to a filtration, with Mo = 0, continuous paths, and [M t ,M :) ]t = 5 l3 t for all t > 0. 
Then M is a d- dimensional BM. 

11.5 Two-dimensional Ito formula 

There is a multi-dimensional version of the Ito formula. We content ourselves for now with the 
following two-dimensional version. The formula generalises (as we shall see) to any number 
of processes driven by a Brownian motion of any number (not necessarily the same number) 
of dimensions. Let W := (W 1 , W 2 ) be a two-dimensional Brownian motion (so that W 1 , W 2 
are independent Brownian motions), and let X := (X 1 , X 2 ) be a two-dimensional Ito process 
following 

dX t = a t dt + b t dW t , (11.5) 

where 




so that (11.5) is equivalent to 

dXj = a\dt + b\ l dWl + b] 2 dW 2 , 
dX 2 = a 2 dt + b 21 dW? + b 22 dW?, 



The convergence also holds almost surely, though we do not prove this here. 
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or in integral form 

X\ = x 1 + ( alds + b^dW} + b 12 dW 2 , 
Jo 

X 2 = x 2 + [ a 2 ds + b 21 dW} + b 22 dW 2 , 
Jo 



or in compact form 



o 



'x 1 
X 2 



X t = x+ / a s ds + b s dW s , t>0, 



Such processes, consisting of a nonrandom initial condition, plus a Riemann integral, plus 
one or more Ito integrals, are examples of semimartingales. The integrands a s , b s can be any 
adapted processes such that the relevant integrals exist. The adaptedness of the integrands 
guarantees that X is also adapted. 

Theorem 11.16 (Two-dimensional Ito formula). Let f(t,x\,X2) be a function f : [0, oo) x 
R 2 R. Then the process Y := (Y t ) t > defined by Y t := f(t, X},Xf) = f(t, X t ) follows 

dY t = f t (t,X},X 2 )dt + f Xl (t,X},X 2 )dX} + f X2 (t,Xl,X 2 )dX 2 

+ ±f xlxl (t,xlx 2 )d[x\ + ±f X2X2 (t,xlx 2 )d[x 2 ] t 

where dpP,X?]( = dX^dX^i = 1,2 are computed according to the rules 
dt dt = dt dW'l = dWl dt = 0, dWi dW 3 t = ft dt, 

with 

xij _ J 1' * = 3i 
"10, i^j. 

In integral form the theorem is 

Y t -Y = f(t,X^X 2 )-f(0,X^,X 2 ) 

= [ Ms,Xl,X 2 s )ds+ [ f Xl (s,Xl,X 2 )dXl+ [ f X2 (s,Xl,X 2 )dX 2 
Jo Jo Jo 

+ \j % f XlXl (s, Xl,X 2 s ) dfX 1 ], + 1 jf f X2X2 (s, Xl,X 2 ) d[X 2 ] s 

+ f f XlX2 {s,XlX 2 )d[X\X 2 ] s . 
Jo 

11.5.1 Markovian diffusion case 

If, in (11.5), we have at = a(t, Xt), h = b(t, Xt) for well-behaved 8 functions a(t, x),b(t, x), so 
that 

dXt = a(t,Xt)dt + b(t,X t )dWt, 



Lipschitz continuity and linear growth conditions are usually sufficient. 
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then the process X is Markovian: 

E[h(X T )\T t ] = E[h(X T )\X t ], < t < T. 
The integral equation for Y may be written 

Y t = Y + f (f t (s, X a ) + Af(s, X s )) ds + f\vf(s, X s )) T b(s, X a ) dW s , 
Jo Jo 

where A is the generator of the two-dimensional diffusion X, defined by 

2 ,22 



Af(t,x) := ^2aHt,x)f Xt (t,x) + ^^2(bb*f(t,x)f XiXj (t,x) 
i=i i=i j=i 

1 2 2 

= a*(t,x)X7f(t,x) + -Y,Y.( bb *y i ( t > x ^^A t > x )> ( 1L6 ) 



i=i j=i 

where * denotes matrix transposition, and where 

Exercise 11.17 (The product rule). Let X l ,X 2 be ltd processes: 

Xl = x l + f a\ds+ fb ll (s)dW l s + fbfdWl 
Jo Jo Jo 

dX 2 t = [ a 2 s ds+ [ b?dW}+ [ bfdWl 
Jo Jo Jo 

Use a two-dimensional ltd formula to derive the product rule 

d(X l t X 2 t ) = X\ dX 2 t + X 2 dX\ + dX\ dX 2 , 

or, in integral form 

[X\ X\ = f X] dX 2 + f X 2 dX] + f d[X\X%. 
Jo Jo Jo 

11.6 Multidimensional Ito formula 
11.6.1 Multidimensional Ito process 

Let Wt = (Wf, . . . , Wf) be a vector of d independent Brownian motions, that is, Wt is d- 
dimensional Brownian motion. We can use the Brownian motion vector to form the following 
n ltd processes X*, . . . , X^\ 

dX\ = a] dt + b] 1 dWl + • • • + b\ d dW t d 
dX^ = a n t dt + bf dWl + • • • + bf dW t d , 



11 THE ITO FORMULA 



77 



or, in matrix notation, with X = (X 1 , . . . , X r 

dX t = a t dt + b t dW t , (11.7) 

where 



X t 



x? j 



{,11 ... b} d 



a t = | : | bt = | : : : | (11.8) 

b nl ... b nd 



Note that the coefficients a and 6 are required to satisfy certain conditions so that the integrals 
implicit in the above equations are well defined. In particular, their elements should all be 
adapted process, so that we know their values at time t if we know X t . 

Theorem 11.18 (Multidimensional Ito formula). Suppose Xt satisfies (11.7). Let 

f(t,x) = (f l (t,x),...,f p (t,x))* 

be a twice differentiable map from [0, oo) x W l into MP. Then the process Y t := f(t,X t ) is 
again an ltd process, whose k th component, Y t k , is given by the multidimensional ltd formula 
as 

dYf = ^(t, X t ) dt + f j gty, X t ) dXl + \itit £ltM> X ^ X ^ (11-9) 

i=i 1 i=i j=i 1 3 

where d\X l ,X d \ t = dX^dX^ is computed according to the rules 

dW l t dWl = Sij dt, dt dt = dW'l dt = dt dW l t = 0. 

Example 11.19. Let W = {W 1 , . . . , W n ) be Brownian motion in M n , for n > 2. Consider 

Rf.= \W t \ = ((W 1 ) 2 t +--- + (W n ) 2 t ) 1/2 , 

which is a process describing the distance of the n-dimensional Brownian motion from the 
origin. Now, the function f(t,x) = \x\ is not differentiable at the origin, but since Wt 
never hits the origin (almost surely, or with probability one) when n > 2 (see, for example, 
0ksendal [11], Exercise 9.7), the multidimensional ltd formula still works. 

Take X t = Wt, so that dX t = dWt, and consider the process Y t = Rt = f(t,X t ) = 
f(t, W t ) = \W t \ = ({Wt 1 ) 2 + ■■■ + (Wr-) 2 ) 1 / 2 . Then f(t, x) = (xj + ... + x 2 n ) 1 / 2 , so that 

<9/ =0 df_ = Xi L d 2 f = Sjj XjXj 
dt dxi \x\' dx 2 \x\ |x| 3 



Then 



i=l 1 1 i=l j=l V 1 / 

^ Wi dWi n - 1 , 

i=i 1 1 
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11.7 Connections with PDEs: Feynman-Kac theorem 

There is a remarkable connection between stochastic calculus for Markov diffusions and 
partial differential equations (PDEs). Consider the one-dimensional diffusion 

dX t = a(t, X t ) dt + b(t, X t ) dW t . (If. 10) 

The process X = (X t )t>o is a Markov process, satisfying 

E[h{X T )\F t ] = E[h(X T )\X t ], < t < T, (If. 11) 

for a function h(x) such that the above expectations are defined. A consequence of the 
Markov property is that the right-hand-side of (11.11) is a function of (t,X t ) only. Write 

v(t,x) :=E[h(X T )\X t = x\. (11-12) 

Lemma 11.20. The process Y = (Yt)o<t<T defined by Y t := v(t,X t ) is a martingale. 

Proof. By the Markov property, we have Y t = E[h(X T )\X t ] = E[h(X T )\F t \. Then, for 
< s < t < T, 

E[Y t \T s ] = E[E[h(X T )\X t ]F s ] 

= E[E[/i(X T )|J r t ]|J' s ] (by Markov property) 
= E[h(X T )\T s ] (by Tower property) 
= E[h(X T )\X s ] 
= Y s . 

□ 

Theorem 11.21 (Feynman-Kac). The function v(t,x) in (11.12) satisfies the PDE 

v t (t,x) +a(t,x)v x (t,x) + ^b 2 (t,x)v xx (t,x) = 0, v(T,x) = h(x). (11.13) 

Proof. By the Ito formula 
dY t = dv(t,X t ) 

= [vt(t, X t ) + a(t, X t )v x (t, X t ) + ^b 2 (t, X t )v xx (t, X t )} dt + b(t, X t )v x (t, X t ) dW t . 

Since Y is a martingale the coefficient of the "dt" term must be zero for all (t,X t ), and 
(11.13) follows. 

□ 

Note that the PDE (11.13) may be written 

v t (t,x) +Av(t,x) = 0, v(T,x) = h(x), (11-14) 
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where A is the generator of the diffusion (11.10): 

Av(t, x) = a(t, x)v x (t, x) + ^b 2 (t, x)v xx (t, x), 

(and this is the form that generalises to a multi-dimensional situation). Note also that the 
theorem is still valid if we replace h{Xx) in (11.12) by h(T,Xx), a function dependent on T 
as well as Xt- 

Finally, there is an obvious generalisation to a multi-dimensional situation. We content 
ourselves with the following two-dimensional version. 

Suppose we have a two-dimensional diffusion X = (X 1 ^ 2 ) following 

dX t = a(t, X t ) dt + b(t, X t ) dW t , (11.15) 

where 

a ^ Xt) ~ \ a 2 (t,X t ) ) ' h{t ' Xt) ~ \ b 2 \t,X t ) b 22 (t,X t ), 

so that (11.15) is equivalent to 

dX l t = a 1 {t 1 X t )dt + b 11 (t 1 X t )dW} + b 12 (t,X t )dW 2 , 
dX 2 = a 2 (t,X t )dt + b 21 (t,X t )dW t 1 + b 22 (t,X t )dW t 2 . 

Let h(t, x) = h(t, xi, X2) be a function h : [0, 00) x I 2 -> I, Define the function 

v(t,x) :=E[h(X T )\X t = x].. (11.16) 

The generator of the diffusion (11.15) is .A, given by (11.6): 



2 2 



Af(t, x) := £ a% x)f Xi (t,x) + ^Yl ( bb *y j (*' x )f*i*i (*> x ) 



2 

i=l i=l j=l 



Theorem 11.22 (Feynman-Kac, two-dimensional). The function v(t,x) in (11.16) satisfies 
the PDE 

vt(t, x) + Av(t, x) = 0, v(T,x) = h(x), 
where A is the generator of the diffusion (11.15). 

11.8 The Girsanov Theorem* 

Given a Brownian motion W := (Wt)o<t<T on F, P) with the filtration F := (Ft)o<t<T 

being that generated by W, and given an adapted process 6 := (6t)o<t<T, define the (local) 
martingale Z by 

Z t := £{-6 • W) t := exp {- J* 6 S dW s - \ f % ds ) > < t < T > 
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where £ is the so-called Doleans exponential. We have that Z follows 

dz t = -e t z t dw t . 

Then, provided 9 satisfies the Novikov condition 



E 



exp 



2 Jo 



9 2 t dt 



< oo, 



(11.17) 



we can define a new probability measure Q ~ P on T = Tr by 



Z T dP, MA G F, 



and the process 



:=W t + [ 6 s ds, 0<t< T, 
Jo 



dp 



and we have, for any immeasurable random 



is a Q-Brownian motion. We write Zt 
variable X, 

E Q [X]=E[XZ T ]. (11.18) 

Remark 11.23. The Novikov condition (11.17) is sufficient to guarantee that Z is a (P, ¥)- 
martingale, so that E[Zt] = 1 and Q is indeed a probability measure. 

As well as (11.18) we have the following results connecting conditional expectations under 
<Q> and P. 

Let < t < T. If X is J^-measurable, then 

E Q [X] =E[XZ t \. 

Bayes formula If X is Jt-measurable and < s < t < T, then 

Z S E Q [X\T S ] =E[XZ t \T s \. 

There is a multi-dimensional version of Girsanov's Theorem. Once again we content 
ourselves with a two-dimensional version. Given a two-dimensional Brownian motion W = 
{W l ,W 2 ) on a stochastic basis (17, J 7 , F := (Ft)o<t<T, P), and a two-dimensional adapted 
process 9 = (9 1 ,9 2 ), define a (local) martingale Z by 

Z t = £(-9-W)t = £{-9 1 -W l -9 2 -W 2 ) t 

= exp (- jf* el dWl - jT 9 2 dW 2 - \ J* [(9l) 2 + (9 2 ) 2 } d*) . 

Then, provided we have the two-dimensional Novikov condition 



E 



exp 



ds 



< oo, 



(11.19) 
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we can define a new probability measure Q ~ P on T = Tt by 

Q(A) = f Z T dP, VA G F, 
J A 

and the process W Q = (W Q '\ W^ 2 ) defined by 

W?' 1 := W t + f 61 ds, W?' 2 := W t + f 6 2 s 
Jo Jo 

is a two-dimensional Q-Brownian motion. 
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